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Abstract 

As t he nunber of processors indistri but ed-memor y mil tiprocessors grows , effici ent 1 y sup- 
porting a shared- memory programing model becomes di fftul t. We have designed the 
Protocol for Hierarchical Directories (PHD) t o al 1 ow shared-memory support for system 
cont ai ni ng nas si ve nunber s of pr oces s or s . PHD el i m nat es bandwi dt h pr obi em by usi ng a 
s cal abl e net work, deer eas es hot - spot s by not r el yi ng on a si ngl e poi nt to di s t r i but e bl ocks , 
and uses a scalable amount of space for its directories. PHDprovides a shared-memory 
model by s ynt he si zi ng a gl obal shared memory f r omt he 1 ocal memor i es of pr oces s or s . PHD 
supports s equent i al 1 y consi s t ent read, write, and t es t - and- s et operations. 

This thesis al so i ntroduces a method of des cr i bi ng 1 ocal i t y f or hierarchical protocols anc 
enpl oys this method in the derivation of an abstract model of the protocol behavior. An 
enbedded nvdel, based on the work of Johnson [13], describes the protocol behavior when 
napped to a k- ary n- cube. The the sis uses these two model s to study the average hei ght i n 
the hi erar chy that operations reach, the longest path messages travel, the nunber of mes¬ 
sages that operations generate, the i nter-transacti on i ssue ti me, and the protocol overhead 
for different 1 ocal i t y par amet er s , degrees of mil ti threadi ng, andnachine sizes. 

W det er ni ne t hat mil tit hr eadi ng i s onl y us ef ul for appr oxi nat elytwotof our t hr e ads ; any 
additional i nt er 1 eavi ng does not decrease the overall latency. For snail nachines and hi gh 
1 ocal i t y appl i cat i ons , thislinitationis due nai nl y t o t he 1 engt h of t he r unni ng t hr e ads . Foi 
1 ar ge nachi nes wi t h medi umt o 1 owl ocal it y, this linitationis due nai nl y t o t he pr ot ocol 
overhead bei ng t oo 1 ar ge. 

Our s t udy usi ng t he enbedded m>del shows t hat i n si t uat i ons where t he run 1 engt h bet ween 
references to shar ed nenory i s at 1 east an order of nagni tude 1 onger than the tine to process 
a si ngl e s t at e t r ansi t i on i n t he pr ot ocol, appl i cat i ons exhi bi t good per f or nance. If s e par at 
cont r ol 1 er s for pr oces si ng prot ocol r eques t s ar e i ncl uded, t he pr ot ocol s cal es t o 32k pr oces s < 
nachi nes as 1 ong as t he appl i cat i on exhi bi t s hi erar chi cal 1 ocal i t y: at 1 eas t 22% of t he gl oba] 
references mist be able to be sati sfied 1 ocal 1 y; at most 35%of the global references are 
al 1 owed t o reach t he t op 1 eve 1 of t he hi erar chy. 

Thesis Supervisor: Et. WlliamJ. Dal 1 y 

'll tl e: Associ ate Professor of El ectri cal Engi neeri ng and Cbnputer Sci ence 
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Chapter 1 


Int r oduct i on 


The shared- memory model has been a conveni ent prograiming paradi gmf or miltiproces- 
s or s . As t he nunber of elements in mil tiprocessors grows , however, effti ent 1 y support i ng a 
shared-me nor y pr ogr aiming model becomes di ffcul t. Bus-based snoopi ng schemes suffte 
for only snail nunbers of processors; they are inadequate for large nunbers of processors 
because thei r bandwi dth does not growwi th the nunber of processors [ 8] . II rectory- based 
cachi ng s chemes , on t he ot her hand, al 1 owshar i ng among 1 ar ge nunber s of pr oces s or s when 
i npl enent ed on net wor k-bas ed conput er s ; the bandwidth of the network mist increase 
wi th the nunber of processors. Hierarchical di rectory-based schemes have the potenti al to 
scale indefinitely because they have neither the space requirements of full-nap directory 
s c hemes nor t he 1 i ni t ed nunber of copi es r equi rements of linited-directoryschemes nor t he 
linear dependence on the nunber of copies for i nval i dat i on of chai ned s chemes . Hierarchi¬ 
cal schemes addi t i onal 1 y expl oi t the spatial andtenporal locality of processes running on 
anachine. W have desi gned the Protocol for Hierarchical n rectories (PHD) to provide 
shared- nem>r y for s ys t ens conpos ed of nas si ve nunber s of proces s or s . 

PHDs ynt he si zes a gl obal shared nem>r y f r omt he pr i vat e 1 ocal nenor i es of pr oces s or s . 
Processors access global addresses in the sane nanner as they access local ones. The 
systemoperates on blocks (or lines ) consisting of several words of data, capitalizing on 
spatial locality. PHDnaintains sequential consistency [14] in its support of read, write, 
and t es t - and- s et operations. 

Int hi s t he sis we als oi nt r oduce a net hod of des cr i bi ng 1 ocal i t y f or hi er ar chi cal pr ot ocol s 
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W deri ve an abstract nvdel of the behavi or of the Protocol for H erarchi cal El rectori es usi ng 
thi s method and use i t to study the average hei ght reached i n the hi erarchy per operati on, 
the longest path of messages traveled per operation, and the nunber of messages generated 
per operati on for different nachi ne configurations. W validate this model using a trace- 
dr i ven si mil at or. 

The abstract model is used to generate inputs to an enbedded tmdel. The enbedded 
model describes howthe protocol behaves when napped to a k-ary n-cube using the our 
proposed nappi ng scheme. 


1.1 Di rector y-Ba se d Pr o t o c o 1 s 

Mtny of the i deas used i n the Protocol for H erarchi cal El rectori es evol ved fromdi rectory- 
based protocol research as veil as earl y hi erarchi cal protocols. Mist of the earl y resear ch 
as sumed cer t ai n capabi 1 i t i es , such as a broadcast abi 1 i ty, vhi ch do not scale veil. Several 
other hierarchical protocols have been proposed si nee the start of this vork. 

1.1.1 FI at Directory-Based Protocol s 

Al 1 di r ect or y- has ed protocols keep a record associated vith each block of nai n nenor y. 
There have been a vide variety of directory schemes proposed and studied. Thng [26] 
proposed a wri te- back schene i n vhi ch the nai n nenory and every cache mist keep a 
directory. In order to find a block, all of the individual directories need to be checked. 
Censi er and Feaut r i er [7] fir s t pr opos ed t he concept of a “di r t y bi t ” vhi chi ndi cat es vhet her 
or not t he val ue s t or ed i n nai n nenor y is t he neves t one. They al s o added a bi t vect or 
to the nai n neimry di rectory i ndi cati ng vhi ch caches have copies of the block. These 
addi t i ons el i ni nat ed theneedtosearcheverylocal di r ect or y after every dat a nodi heat i on. 

Agar val [4] di s cus s ed t hes e s chenes and t hei r 1 ack of s cal abi 1 i t y due either to the need 
to broadcast or to the presence of a bottl eneck. He nenti oned the i dea of di stri buti ng the 
di r ect or y acr os s t he nenor i es , i n or der t o prevent any hot 11 e necks . Chai ken [ 8] shoved t hat 
di r ect or i es are s cal abi e and t hat s one shared- dat a cachi ng s chenes , for nany appl i cat i ons , 
perf ormbetter than schenes vhi ch cache onl y pri vate data. The shared- data schenes that 
helookedat i ncl ude 1 i ni t ed di r ect or y, full nap, and si ngl y and doubl y-1 i nked chai ns . 
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Liniteddirectory schenES enpl oy a linited nunber of poi nt er s t o keep t rack of whi ch 
processors have copies of particular blocks; when a newprocessor wants a copy and the 
1 i ni t ed nunber of poi nt er s have all been al 1 ocat ed, t he s chene mis t res or t to br oadcas t or 
t o i nval i dat i on. In a f ul 1 - nap di r ect or y, al 1 pr oces s or s can have copi es of any bl ock. Singly 
1 i nked chai ns di s t r i but e t he di r ect or y ent r y, t hr eadi ng i t t hr ough t he pr oces s or s whi ch have 
copi es . Ebubl y-1 i nked chai ns use a doubl e 1 i nkage, t o al 1 owt he chai n t o be f ol 1 owed ei t her 
way. 


1.1.2 Hi erarchi cal Schenas 

In t he above - nent i oned di r ect or y s c hemes , t he hone 1 ocat i on of a parti cul ar bl ock i s s t at - 
i cal 1 y fixed: any processor wit hout a copy of t he bl ock i n i t s cache whi chwishes to access 
that bl ock mist 1 ook i n a si ngl e fixed 1 ocati on. K erarchi cal di rectory schenes were desi gned 
hot h t o reduce t hi s s t at i c r equi re nent by pr ovi di ng adapt i ve dat a ni gr at i on and to s ol ve 
t he 1 i ni t ed bandwi dt h pr obi emof single bus schenes. 

Ahi erarchi cal schene, i n general, has a tree structure. At the 1 owest 1 evel of the tree are 
pr oces s or s wi t h caches ; at t he ot her 1 evel s are di r ect or i es r ecor di ng whi ch bl ocks are cached 
by nodes 1 ocat ed physi cal 1 y bel ow t hemi n t he t r ee. Any nunber of copies of a block are 
al 1 owed toexist at a t i ire. Are ad r eques t i s t ypi cal 1 y pr opagat ed up t he tree unt i 1 a copy 
islocated; awritet ypi cal 1 y i nvol ves 1 ocat i ng and i nval i dat i ng al 1 of t he ext r a copi es by t r 
traversal and then perf orning the wri te. 

In [30] Wl son proposes the first hierarchical mil ti processor architecture. He suggests 
modifications to several bus-based schenes in order to forma protocol for his proposed 
hi erar chy whi ch us es shared bus es of caches t o f or mt he t ree. He does not, however, consider 
howhi s i deas woul d work on very 1 arge s cal e syst errs . Archibald, i n [ 5] , proposes another 
s ol ut i on i nt ended f or a snal 1 hi erar chy of buses , r enar ki ng t hat hi s pr ot ocol is f easi bl e f oi 
a t wo- level hi erar chy, but not necessarilyathree-level, orf our -1 evel one. 

Ifer i di and Hager sten[12] later pr opos ed a hi erar chi cal s chene whi ch was desi gned f or a 
mich 1 ar ger s ys t ens : t he Eht a II ffus i on Me hi ne ( DdV|. Thei r ar chi t ect ur e al s o as sums 
a t r ee conpos ed of buses , whi chforces all r eques t s t o be r out ed t hr ough t he hi erar chy. The 
i nt er nedi at e 1 evel directories store i nf ornati on as to whether copies of each bl ock cached 
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belowis cached anywhere above or whether it is exclusive to that subtree, thus allowing 
themto reduce trade on the higher-level buses during writes. Their architecture also 
el i ni nat es t he need f or a hone 1 ocat i on f or a bl ock. Thei r hi er ar chi cal s cheme t ypi fies a 
CQVA, or Cache- Chi y Mrrory Ar chi t ect ure, as defined i n St ens t r orris [25] paper. 

In a later paper, Aang, Thangadurai and Bhuyan [32] proposed a sirrilar hierarchical 
bus s cheme whi ch al s o keeps t rack of t he excl usi vi t y s t at us for each bl ock. Uhl i ke Har i di ’ s 
scheme, however, they assume that the rrai nmemDry i s si tuated at the top of the hi erarchy, 
pr ovi di ng a s t at i c pi ace f or bl ocks t o be s t or ed i n when t hey are t hr own out of t he caches . 

Scott and Gbodrran describe a hierarchical scherre for processors connected using a 
k-ary n-cube net work i n [ 23] and [ 24] . Thei r rrappi ng s cherre pr ovi des ri ngs , which replace 
buses as the broadcast rrethod for their protocol. They also introduces the concept of 
pruni ng caches , whi ch el i ni nat e t he need of all of t he ear 1 i er pr ot ocol s for corrpl et e rrul t i - 
1 evel i ncl usi on, i.e. keepi ng a hi gher 1 evel di rectory entry for every 1 ower 1 evel one. Prunin 
caches al 1 owa t r adeoff betweendirectory size and net work bandwi dt h to be dynarri cal 1 y 
rrade, and coul d be added to PHDl 

Ma, Pradhan, and Thi ebaut [18] [19] are currentl y worki ng on a hi erar chi cal directory 
s cherre for non- bus - has ed ar chi t ect ur es , but have not yet f ul 1 y worked out t he det ai 1 s of t he 
prot ocol. 

The Kendal 1 Square Bes ear ch Cbrrpany has bui It a s ys t emwi t h a r i ng- has ed hi erar chi cal 
di rectory scherre [6]. In their s cherre rrul t i -1 evel inclusion is required. They have not 
rel eased mrch i nforrrati on about their protocol. 

Par t has ar at hy [ 21] studi ed an earl i er ver si on of PHD, DHP, des cr i bed i n [ 29] . Although 
his refined ver si on of DHP traverses the hi erarchy fewer times than does PHD, it will 
deadl ock under cer t ai n condi ti ons . Parthasarathy’s work does not consider this problem 
Hs protocol also does not guarantee that a read operati on wi 11 rrake enough pr ogres s to 
corrpl ete even in the absence of deadl ock si nee write operations can ski p ahead of reads 
i ndefini t el y. 
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1.1.3 HD 

The Protocol for Hierarchical Directories is a tree-based hierarchical directory protocol 
Any nunber of processors can have read-onl y copi es of a bl ock i n thei r caches. Tb find 
a bl ock, a processor sends a 1 ocati on message whi ch travel s upwards unti 1 a node whi ch 
knows of a copy is found. This node sends a nessage which travels downwards until it 
reaches a node wi th a copy. The node wi th a copy sends the bl ock di recti y to the requesti ng 
node, whi ch t hen sends a confirnat ionnessage upwar ds t o i ndi cat e t hat i t has fini shed its 
read. In this nanner, reads can be sati sfied i n the 1 owest connon subtree c ont ai ni ng t he 
requester and a copy of the bl ock. 

Wi t e ope rat i ons i nvol ve findi ng all of t he copi es of ablockinthe syst emand del et i ng 
them Oil y the nodes in the snallest subtree containing all copies of the block and the 
wri te requester are i nvol ved in a write. The owner of the bl ock transfers owners hi p to 
the node requesting the write. Acknowl edgnent s of del eti on f romal 1 of the nodes which 
previ ousl y had copi es areconbined, and an acknowl edge nes sage i s sent downwar ds t o the 
node t hat r eques t ed t he wri t e. The test- and- set ope rat i on i s act ual 1 y a t es t - and- test- and- s ei 
ope rat i on; i t i s i npl enent ed as an opt i ni zed conbi nat i on of t he read and wri t e ope rat i ons . 

1 . 2 An alysis and Lo c a 1 i t y Mo del i n g 

Mtny previous irodel s [15] [28] [31] of hierarchical cache consi stency protocol s have n»d- 
eledbus architectures, and as such, consi dered bus trafffc effects to be irost inportant. 
Leutenegger and \fernon, in [15], assune uniformcache mss rates across the nachine. 

Aang [31] as sunes a si ngl e-level clustering no del for reference rates, where each snal lest 
group of pr oces s or s is equal lylikelyto access s one bl ocks and all ot her pr oces s or s are equal 1 1 
1 es s 1 i kel y t o acces s t hos e bl ocks . Wrnon, Jog, and Sohi [28] do not di recti y consi der data 
locality; ins t ead, t hey choose fixed ni s s r at i os for di ffer ent 1 evel caches . 

Scott, i n [ 24] , act ual 1 y cal cul at es t he t r affc for a r i ng- bas ed hi er ar chi cal s ys t em He 
assunes best-case, worst-case, and randomdata-access patterns i n hi s study. 

Johns on[ 13] s t udi es locality and its effects on mil tiprocessor per f or nance. He der i ves a 
conbi ned model of applications, comruni cati onnechani sm , and i nt er connect i onnet works , 
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and uses the result toshowthat “expl oi ti ng comruni cati on 1 ocal i ty provi des gains which 
are at nos t 1 i near i n t he f act or by whi ch average coimuni cat i on di s t ance i s reduced, ” as 
long as the out s t andi ng nunber of comruni cat i ons per processor is bounded. W use his 
model as the basis for the embedded model studies of Chapter 6. 

Stenstroip Joe, and Gipta [25] compare the performance of a CQMk archi tecture 
with that of a NUMk (non- uni f ormnemory architecture). They find that the CQMk 
ar chi t ect ur e perform wor set han t he NUMkone f or nany sit uat i ons , such as t hos e where 
coherence ni s s es doni nat e over capaci t y ni s s es . Mny of t hei r as suirpt i ons , however, do not 
appl y t o t he work describedinthis t he sis. They as sun® a 16 processor configur at i on, where 
t he effect s of 1 ocal i t y ar e goi ngtobe less i npor t ant t han on a nas si vel y par al lei nachi ne. 
They also assun® that the CQMk archi tecture will be running Hari di and Hagersten’s 
DCMpr ot ocol. PHD, on t he ot her hand, as wi 11 be expl ai ned i n Sect i on 1. 4, not onl y us es 
a shor t er pat hinorder tosatisfyread r eques t s , but al s o el i ni nat es t he r epl acenent pr obi em 
of the DDMpr ot ocol. The paper concludes withtheir proposal of COMA F, aflat CQMk 
architecture. COMA F 1 i ke PHD, has a master (owner) node. 

1.3 J- Ma chine 

The cache coherence protocol was designed as part of the J-Michine [9] project at MT. 

The J-Mtchi ne i s a nas si vel y par al 1 el, fine-grai ned message-passi ng concurrent computer. 
Although the J-Michine was designed to efficiently support a message-passi ng language, 
it provides inexpensive synchroni zati on pri ni ti ves to support other progr aiming node 1 s 
as well. The cache coherence protocol was devel oped i n t he context of considering shared- 
nEirory progr aiming envi ronn®nt s for the J-Michine. 

1 . 4 Contri but ions 

The Protocol for Hierarchical El rectories differs in several ways fromprevi ousl y proposed 
hi er ar chi cal s chen®s . It is desi gned for ness age - pas si ng mil t i conput er s ys t em whi ch us e 
snal 1 cache block sizes. It is hot h s cal abl e and s t r ongl y coherent. 
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1.4.1 Scalability 

PHDi s seal abl e i n cost and 1 atency, as defined by Scott i n [ 23] . He requi res that cost grow 
si ower t han 0(1^ wi t h nachi ne size, and 1 at ency grow no f as t er t ha^.f^lV 

Ghst The cost of the hardware includes the cost of the network and the cost of the 
di rectory whi ch stores tag bits. A k-ary n-cube, as 1 ong as the di mensi onal i ty properl y 
increases with size, grows at less th^iji ^2®] . The di rect or y overhead f or PHD i s 
(QN og If by Scott’s defini t i ons . Ther ef ore, PHDi s s cal abl e in cost. 

Latency As shown in Chapter 5, the uni oaded-net work predicted latency per read or 
wri te operati on seal es at less t hail) (^Alhe 1 at ency due to protocol overhead for the 
proposed enbeddi ng of PHDi nto a k- ary n- cube depends on the total nunber of nessages 
sent and thus the degree of sharing. 

Bbttleneck at the Tbp of the Hierarchy fill i ke i n a bus - has ed hi er ar chi cal ar chi t ec- 
t ur e, r eques t s whi ch span acr os s t he nachi ne are not cons trainedtocross t hr ough t he s ane 
poi nt. PHDdi s t r i but es t he 1 evel s of t he hi er ar chy acr os s each node of a nachi ne. There i s 
no bottl eneck at “the” top di rectory, because there are di fferent top di rectori es for di fferent 
bl ocks . Thi s nappi ngis describedinSection2.2. 

1.4.2 Messages and Longest Rith Traversed 

Becaus e PHDi s not restri cted by the nachi ne ar chi t ect ur e to a f ol 1 owt he hi er ar chy at al 1 
t i nes , hot h t he 1 onges t pat h t ravel ed and t he nunber of ness ages gener at ed per read are 
shorter and fewer than i n an enforced hi erarchy. 

longest Rit hper Qierat ion As s hown inli gur el.l, are ad in PHDi s satisfieddirectly 

after t wo t raver sal s of t he hi er ar chy and a si ngl e direct ness age to del i ver t he dat a. St r i ct 

hi erarchi es requi re four hi erarchy traversal s before a read resul t can be used. 

Messages per Qjeration The nunber of nes s ages per read ope rat i on i s al s o snal 1 er i n 

PHD t han i n a s t andar d hi er ar chy. As R gur e 1.2 illustrates, onl y t hr ee t raver s al s of t he 
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hi erar chy wor t h of ixes s ages pi us one di rect dat a- del i very nes s age need t o be s ent for PHD 
as opposed to four traversal s of the hi erarchy worth of “messages” for the stri ct hi erarchy. 

1.4.3 Owiershi p 

The concept of ownershi p [ 10] as used i n thi s protocol was deri vedfromboth Li [16] [17] and 
Tbtty [ 27] . An owner of a bl ock i s responsi bl e for it. Any other node can onl y have a copy 
of the block, whi ch can be as ync hr onousl y thrown away i n or der to nake roomfor other 
bl ocks . That node can t hen i nf or mt he r es t of t he s ys t emat its leisure wit hout affect i ng t he 
correctness of the protocol. This abi 1 i t y t o t hr owaway unneeded copi es of bl ocks without 
t he gl obal t r ans act i ons r equi red by Har i di and Hager stenresults inless t i me needed i n or der 
t o i nval i date bl ocks when caches are f ul 1. 

1.4.4 Mppi ng Schem 

This thesis proposes a nappi ng schene desi gned to nap hi erarchi cal cache coherence pro¬ 
tocols ont o non-1 or oi dal k-ary n-cubes . This schene allows eas y cal cul at i on of parent and 
chi 1 d nodes , and is desi gned t o reduce comruni cat i on t o t he area of t he net work cont ai ni ng 
parti ci pati ng nodes . 

1.4.5 local i ty Mdel 

Thi s t he si s i nt ro duces a net hod of des cr i bi ng 1 ocal i t y f or hi er ar chi cal cache coherent pr ot o 
col s and i ncorporates thi s nethod i n a irodel. The thesi s al so shows howthe nethod can 
be used to accuratel y predi ct the longest path travel ed per operati on and the nunber of 
nessages sent per operation. 

1.4.6 Embedded Mdel 

Thi s thesi s al so i ntroduces a nedel for descri bi ng the behavi or of PHD as enbedded i nto 
a k- ary n- cube. Thi s nedel i s based on the work of Johnson [ 13] , and nedel s appl i cati ons , 
pr oces s or s , and net works . The no del is usedto show t hat t he enbeddi ngwill scalewell 
f or appl i cat i ons wi t h node rat e localityin sit uat i ons where t he nunber of cycl es needed t o 
process the protocol transactions is snail. 
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1 . 5 Th e s i s Ove r v i e w 

The focus of t hi s t he si s is t he des cri pt i on and t he no del i ng of a hi erar chi cal, di rect or y- bas ei 
cache coherence protocol. Chapter 2 describes the Protocol for Hierarchical II rectories in 
moderate detai 1 and proposes an enbeddi ng of the protocol i nto a k- ary n- cube. Chapter 3 
di s cus s es s one of t he i s sues i nvol ved i n desi gni ng hi erar chi cal pr ot ocol s . Chapt er 4 out 1 i ne: 
t he si mil at or us ed t o t es t and expl ore PHI} t hi s chapt er al s o expl ai ns t he si mil at or ver i her. 

Two model s were used to study the protocol. The abstract model, whi ch consi ders the 
protocol runni ng on an abstract hierarchy, is des cri bed i n Chapt er 5. Chapter 6 extends 
this model to showhowthe protocol behaves when enbedded as proposed i n Chapter 2. 

Chapter 7 concludes the thesis, out 1 i ni ng areas of future research. 



Ch a p t e r 2 


Protocol Overview 


Thi s chapter provi des a descri pti on of the behavi or of the Protocol for H erarchi cal El rec¬ 
tor i es . At abl e listing the exact behavi or of t he pr ot ocol is 1 ocat ed i n Appendi x G Thi s 
chapt er al s o br i efly out 1 i nes a net hod of nappi ng a hi er ar chy to a k- ar y n- cube. The next 
chapt er discusses the issues i nvol ved i n t he desi gn of a hi er ar chi cal pr ot ocol. 

2. 1 Protocol De scri pti on 

Thi s secti on expl ai ns the operati ons used by PHDto ensure consi stency whi 1 e coordi nati ng 
the global read, write, and t es t - and- s et operations. Sect i on 2. 1. 1 br i efly des cr i bes the op 
erationof the protocol. Sections 2. 1. 2 and 2. 1. 3 out 1 i ne t he defini t i ons and t he not at i ons 
used i n the descri pti on of the protocol. Secti ons 2. 1. 4, 2. 1. 5, and 2. 1. 6 expl ai n the protocc 
i n nor e det ai 1, des cr i bi ng t he r ead, write, and t es t - and- s et oper at i ons , respectively. 

2.1.1 Gtervi ew 

PHDsupports three essenti al global prinitives: read, write, and t es t - and- s et. Any nunber 
of nodes can have read- onl y copi es of a bl ock i n thei r caches . Tb find a bl ock, a node asks 
its parent f or a copy. The parent mis t knowwhi chof its chi Id subt r ees have copi es . If none 
do, i t f or war ds t he ness age upwards . If one does , t he read nes s age i s f or war ded t o i t. Bead 
oper at i ons can t her ef or e be s at i sfied 1 ocal 1 y. 

Wi t e oper at i ons i nvol ve findi ng all of t he copi es of ablockinthe syst emand del et i ng 
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t hem Oil y t he nodes i n t he s nal lest subt ree whi ch cont ai ns al 1 copi es of t he bl ock and t he 
wri t e reques t er are i nvol ve dint he write process. Ac know! edgnent s of del et i on f r omal 1 of 
the nodes whi ch previ ousl y had copi es areconbined, and an acknow! edge n*s sage i s sent 
down to the node requesti ng the wri te. The owner of the bl ock transfers ownershi p and 
a val i d copy of the bl ock tot he write r eques t er. The test- and- set ope rat i on i s act ual 1 y a 
t es t - and-1 es t - and-s et operation; it is i npl enent ed as an opt i ni zed conbi nati on of the read 
and write operations. 

2.1.2 Efcfini ti ons 

There are two types of di rectory entri es i n the hi erarchy. The first type, a l eaf 1 evel entry, 
represents an actual block of cached data and woul d be found in the nenory of a node 
at the bottomof the hierarchy. The second type, a parent entry, is a di rectory that stores 
i nf or nat i on about whi ch chi 1 d subt ree s have copi es of a parti cul ar bl ock. The parent ent r i es 
correspond to nenory on son* node of the hi erarchy whi ch i s not at the 1 eaf 1 evel. 

Every cache entry on a 1 eaf node nay be purgeabl e or unpur geable. Purgeabl e ent r i es nay 
be del eted at any ti n*. Che copy of every bl ock mist never be del eted; the node desi gnated 
as the owner i s responsi bl e for keepi ng thi s naster copy unti 1 ownershi pis passed on. The 
onl y pur geabl e entries are ones whi chare in the readable state are yet not owned. Afull 
list of the possible states a of leaf entryis shown i n Thbl e 2. 1. 

A par ent ent r y consi s t s of a vect or cont ai ni ng t wo bi t s of s t at e f or ever y chi 1 d subt ree, 
t hr ee addi t i onal bi t s of s t at e, and a poi nt er t o t he subt ree t hat t he current wri t e r eques t, i 
any, was s ent from The entrywill i ndi cat e whi ch of f our possible states each chi Id subt ree 
i s i n: invalid, confirmed, valid, or waiting. The invalid state n*ans that there i s no copy of 
that bl ock i n that subtree. The confirmed state n*ans that ei ther at 1 east one node i n that 
subtree has a copy of that block or son® where below a ness age is propagating upwards 
indicating that the bl ock has been deleted. The valid state n*ans that an operati on i s 
occur ri ng i n t he subt ree t hat wi 11 event ual 1 y nake t he subt ree confir n*d. The waiting s t at e 
n*ans that the subtree has at 1 east one node whi ch i s wai ti ng f or the resul t of a read, and 
that the parent entry needs to send the bl ock down to that node upon recei vi ng the data. 

The wai ti ng state is enpl oyedby the protocol to support read conbi ni ng. Asubtree vector 
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State 

Descripti on 

readable_yowner 

Entry is readable. 

Node is owier. 

readaH e_nowier 

Ehtryis readable. 

N)de is not owier. 

vai ti ng_for jead 

Nide is vai ting for a read to complete. 

Nide is not owier. 

witaHe 

Ehtryis witaHe. 

Nide is owier. 

vai ti ng_for_wi te_nowier_npl _ntead 

Nide is vai ting for a wite to conpiete. 

Nide is not owier. 

Invalidation has not yet reached tHs node. 

Nide nay not respond to read nassages. 

vai ti ng_for_wi te_nowier_npl _yread 

Nide is vai ting for a wite to conpiete. 

Nide is not owier. 

Invalidation has not yet reached tHs node. 

Nide has valid value vhichcanbe distributed. 

vai ti ng_for_wi te_yowner_njl 

Nide is vai ting for a wite to conpiete. 

Nide is owier. 

Invalidation has not yet reached tHs node. 

Nide has valid value vhichcanbe distributed 

vai ti ng_for_wi te_nowier_ypl _ntead 

Nide is vai ting for a wite to conpiete. 

Nide is not owier. 

Invalidation has reached tHs node. 

Nide nay not respond to read nassages. 

vai ti ng_for_wi te_nowier_ypl _yread 

Nide is vai ting for a wite to conpiete. 

Nide is not owier. 

Invalidation has reached tHs node. 

Nide has valid value vhichcanbe distributed 

vai ti ng_for_wi te_ok_yowier_n|i 

Nide is vai ting for a wite; only needs final ack 

Nide is owier. 

Invalidation has not yet reached tHs node. 

Nide has valid value vhichcanbe distributed 

vai ti ng_for_wi te_ok_yowier_ypl 

Nide is vai ting for a wite; only needs final ack 

Nide is owier. 

Invalidation has reached tHs node. 

Nide has valid value vhichcanbe distributed 

vai ti ng_for_wi te_val ue_nowier_ypl _nread 

Nide is vai ting for a wite; only needs owiersHp 

Nide is not owier. 

Invalidation has reached tHs node. 

Nide nay not respond to read nassages. 

vai ti ng_for_wi te_val ue_nowier_ypl _yread 

Nide is vai ting for a wite; only needs owiersHp 

Nide is not owier. 

Invalidation has reached tHs node. 

Nide has valid value vhichcanbe distributed 

vai ting_for_tas 

(Full set corresponding to vaiting_for_wite set). 


Thbl e 2. 1: The possi bl e states of a 1 eaf cache entry. 
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Cbnbi nati on 


Ites cri pt i on 


vO_wO_cX 


All subtrees are either confirmed or invalid. 


vX_wO _c 0 


All subtrees are either valid or invalid. 


vX_wX_c 0 


All subtrees are valid, waiting, or invalid. 


vX_wO _c X 


All subtrees are valid, confirmed or invalid. 


vX_wX_c X 


All subtrees are valid, waiting, confirmed or invalid. 


Thble 2. 2: 


The possi bl e conbi nati ons of states i n the subtree vector of a cache parent entry. 


can onl y have cer t ai n conbi nat i ons of t hese s t at es , shown i n Thbl e 2. 2. 

Al 1 parent ent r i es are narked as ei t her shared or excl usi ve. An excl usi ve ent r y i ndi cat es 
that all copies are within the current subtree. All entries at the top level node of the 
hierarchy, by definition, are exclusive. A di rectory entry on a node whi ch i s narked as 
shared, on t he ot her hand, i ndi cat es t hat t her e nay be a copy out si de of t he subt r ee r oot ed 
at that node. 

Aparent entry nay be locked or unlocked. If an entry is locked, then all ness ages which 
wi sh to access i t mist wai t unti litis uni ocked. Mssages whi ch uni ock an entry are of 
course not requi red to wai t. Duri ng a wri te, when a parent entry i s 1 ocked, there are two 
more possible state modifiers anode night have: on-request .path and writer _acknoui edged. 

If t he node cont ai ni ng t he parent entryis locatedonadirect pat h bet ween t he wri t i ng node 
and the top of the write, it is onjrequest jpath. If the parent entryis on the request path of 
the write, the final state, writer _acknoui edged, indicates whether or not the wri ti ng subtree 
has acknowi edged t he wri t e i nval i dat i on. Thbl e 2. 3 shows these states. 

There ar e ei ght een di ffer ent mess ages us ed by t he pr ot ocol. They are listedin Thbl e 2. 4, 
and wi 11 be expl ai ned as they are used. 

2.1.3 Notation 

Throughout t hi s chapt er, di agr ans of trees will be shown. These trees are vi r t ual trees, 
and do not act ual 1 y exi s t on t he t ypi cal ar chi t ect ur e PHDwi 11 be napped t o. The nappi ng 
is des cr i bed i n Sect i on 2. 2. Except where noted, the figures onl y consi der a si ngl e cache 
bl ock. 
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S_U_NOP_NGA Bitryis al s o cont ai ned i n anot her subsystem (shared). 

Bit r y i s uni ocked. 

E_U_NOP_NGA Bitryis onl y cont ai ned i n t hi s subsystem (exclusive). 

Bit r y i s uni ocked. 

S_L_NOP_NGA Bitryis al s o cont ai ned i n anot her subsystem (shared). 

Bit r y i s l ocked. 

Bitry i s not 1 ocated on path fromwri ter to top node for the wri te. 
S_L_YOP_NGA Bitryis al s o cont ai ned i n anot her subsystem (shared). 

Bit r y i s l ocked. 

Bit r y is locatedon pat h f r omwri ter t o t op node for t he wr t e. 
Bitry has not yet recei ved acknowl edge f romthe writi ng subtree. 
E_L_YOP_NGA Bitryis onl y cont ai ned i n t hi s subsystem (exclusive). 

Bit r y i s l ocked. 

Bit r y is locatedon pat h f r omwri ter t o t op node for t he wr t e. 
Bitry has not yet recei ved acknoul edge f romthe writi ng subtree. 
S_L_YOP_YGA Bitryis al s o cont ai ned i n anot her subsystem (shared). 

Bit r y i s l ocked. 

Bit r y is locatedon pat h f r omwri ter t o t op node for t he wr t e. 
Bitry has recei ved acknowl edge f romthe wri ti ng subtree. 
E_L_YOP_YGA Bitry i s onl y contai ned i n thi s subsystem (exclusive). 

Bit r y i s l ocked. 

Bit r y is locatedon pat h f r omwri ter t o t op node for t he wr t e. 
Bitry has recei ved acknowi edge f romthe wri ti ng subtree. 

Thbl e 2. 3: The possi bl e states of a cache parent entry. 



A node with no copy of the data. 

A node with a valid copy of the data. 


A node requesting an operation. No copy of the data. 


Figure 2.1: This figure explains the synbols used throughout the chapter. Nate that a 
node wi th a “val i d” copy of the bl ock i s an i npreci se descripti on, basi cal 1 y i npl yi ng that 
the node, i f 1 eaf, has a copy of the bl ock i n a readabl e or wri tabl e state. If a grey node i s 
not a 1 eaf node, i t i s assumed to have at 1 east one subtree i n the confirmed state. 



30 


CHAPTER 2. 


PROTOCOL OVERVIEW 


Mss age 

Ites cription 

findJ owes t .coimonJ - or _r ead 

Look upwards for nearest node with value 

redi rect ed_find_l owest .coimonT’or _r< 

idd)ok again; f ai 1 ed i n current subtree. 

read 

WI k dowmwar ds t o a node wi t h val ue. 

findJ owes t xomnin_f or _wr i t e 

Look for 1 ca of all nodes wi t h val ue. 

1 ock 

Lock all nodes bel owwi th val ue. 

ack 

All copies beloware invalid. 

ackl 

Al 1 copi es bel owexcept wri t er ’ s are i nv 

throwi ng^away 

Subtree bel owi nval i d; once was confirmed 

c hange _t o _e xc 1 us i ve 

INode is least coimBn ancestor of all copi 

findJ owes t _coimDnJor _t as 

Look upwards for nearest node with value 

redi rectedJindJ owest .comronJor J; 

isLook agai n; f ai 1 ed i n current subtree. 

c onfir mval ue 

Subtree belowhas gotten a copy of value. 

read.dat a 

Level =0: Send data di recti y to reader. 
Level >0: Send data to wai ti ng subtrees 

unc onfir mval ue 

Subtree belownownot c onfir me d, not i nva] 

r ead_t as 

WI k dowmwar ds t o a node wi t h val ue. 

wr i t e _ok 

No other copi es left in tree. 

s _wri t e_own 

Ownership transfer message for writes. 

t as Jai 1 ed 

Tfe s t - and- set failedininitial read stag 


Tfibl e 2. 4: The messages sent by the protocol. 
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send a copy directly 


Figure 2.2: This diagramshows the three phases which occur during a read operation. 

INode 5 want s tore ad X, soit s ends ness ages tolocate X Thi s fir s t phase, 1 ocat i ng a bl ock, 
fini shes when INode 6 i s i nf or ned t hat INode 5 want s a copy. At t hat poi nt, phas e twostarts, 
i n whi ch a copy i s sent di recti y to INode 5. Finally, i n phase three, c onfir nat i on t hat the 
val ue ar r i ved i s s ent t o INode 2 and t hen f r omNode 2 t o INode 1. 

At the 1 evel of detai 1 of the figures i n thi s chapter, nodes nay be i n one of three states 
as s hown i n Fi gur e 2.1: invalid, valid, and requesti ng an operation. These states apply 
i nt ui t i vel y to bot h 1 eaf and parent ent r i es . Not e t hat t he “val id” s t at e for a parent node is 
most si nil ar to havi ng a c onfir ned subtree. 

2.1.4 Read ng 

Aread t o a 1 ocal 1 y cached bl ock occur s immediately. Qiareadniss, however, a three- 
phas e ope rat i on mis t be per f or ned, as sket ched i n Fi gur e 2.2. The fir s t phas e locates the 
nearest bl ock whi 1 e si mil taneousl y updati ng the states i n the hi erarchy. The second phase 
sends a copy of the bl ock di recti y to the requesti ng node. The thi rd sends a confirnati on 
that the node has actual 1 y recei ved the copy. There are two possible conpl i cat i ons to a 
read. First, a write nay be going on at the sane tine. Second, the copy chosen to be 
repl i cat ed nay be del et ed 1 ocal 1 y bef ore the read request reaches it. Both of these probi errs 
are handl ed by the protocol. 

Anode which wishes tolocate a block for a read sends a findJ owest-Comwn_for_read 
ness age t o i t s parent. If t he parent has no re cor d of t he bl ock, it s ends t he s an® ness age 
up, unti 1 a node i s f ound that has it. If the bl ock exi st s , the n®s sage travel i ng upwards wi 11 
event ual 1 y ar r i ve at a node whi ch knows where theblockis. Iftheblockdoes not exi s t, t he 
protocol will allocate it aut onat i cal 1 y or signal an error, whi chever i t has been configured 
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Fi gure 2. 3: The states that a 1 eaf node can enter duri ng a read. 


t o do. 

If the entry at that node is unlocked, and at least one subtree is confirmed but none are 
valid, the node mist update its vector of who has the bl ock and pi ck one of the confirned 
subt r ees to send t he read mess age down to. If any of t he subt r ees are val id, t he r eques t i ng 
subtree i s narked as mi ti ng, and the read proces s suspends here. Wen the val i d subtrees 
are changed to confirmed, indicating that they have finished the read, the values are sent 
downtoall mi ti ng subtrees . This mechani smsuppor t s r ead conbi ni ng. 

Wen a non-1 eaf node r ecei ves a read mess age, i t changes the state of its entryto shared, 
s i nc e t he re que s t mis t have c one f r omout s i de of t he s ubt r e e it he ads , and f or mr ds t he read 
mess age down t omr ds t he confirmed subt r ees and 1 eaves . Wen aleaf entryina r eadabl e 
or wri tabl e state recei ves a read, it sends apurgeable copy (usi ng the read-data message) 
directly to the requesting node. Wen the requesting node receives the value, it sends a 
confirmval ue mes s age upmr ds to its parent, t o c onfir mt hat it has recei ved a copy of the 
bl ock. 

Oonpl i cat i ons to a read can occur when a node del et es a bl ock whi ch s one ot her node 
i s tryi ng to read. Thi s del eti on nay cause a read message to reach a node whi ch no 1 onger 
knows about the block. In this case, a redirected-findJowest-Comwn_for_read nessage is 
sent upmrds to find a di fferent record of the bl ock. 

In al 1 of t he cases des cr i bed above, if a node is ever reached whos e ent r y f or t he bl ock 
bei ng r ead i s 1 ocked, t he r ead i s t enpor ar i 1 y hal ted. Thi s hal t i ng per ni t s t he s er i al i zat i oi 
of reads and writes. 

Fi gure 2. 3 shows the states that can occur i n a 1 eaf node duri ng a read. Wen a node 
wishes t o r ead a bl ock whi ch i t has not 1 ocal 1 y cached, it enters a wait ing.f or _r ead state 
and sends a findJ owest -Commn_f orjread nessage to its parent. In a nornal read case, 
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send a copy directly 


Fi gure 2.4: Thi s di agr ami 11 us t rat es how re ad conbi ni ng occur s . As shown inthe left tree, 
INode 5 r eques t s a read j us t as i n Fi gure 2.2. At any tine bet ween t he locate ness age 
reachi ng INode 2 and the confirninessage reachi ng i t, INode 4 al so deci des to read the sane 
value, and sends a locate message to its parent, INode 2. INode 2 records that INode 4 is 
waiting for that block. As shown i n the ri ght tree, when the confirnati on message from 
INode 5 reaches INode 2, INode 2 sends the datafromthe read to INode 4, whi ch responds 
with a c onfir nat i on. Note that only one confirnati on is sent fromNbde 2 to Node 1; 
confirnati on i s sent by Node 2 onl y after recei vi ng confirnati on fromNbde 5. 

the node will be i nf or ned by a read-data message of the value, and will then enter the 
readabl ejnowner state. If, on the other hand, a wri t e s t ar t s bef or e t he r ead coirpl et es , the 
node nay receive a lock message before the newvalue. The node blocks the write from 
conpl et i ng by not acknowi edgi ng t he l ock unt il it recei ves the new val ue and conpl et es its 
read. 

If s ever al nodes si mil t aneousl y try to read a block whi ch i s not al ready wi del y di s - 
tributed, the messages will be conbined. For exanple, if three leaf nodes with the same 
parent trytoreadblockX, theywill all send findJ owest -commn-f or _read ness ages to t hei r 
parent. Wen t he fir s t mess age reaches t he parent, it wi 11 updat e its entrytorecordt hat 
the node which sent the first nessage, Node 1, has a valid (but not confirned) copy, and 
f or war d up t he r eques t. The second ness age toarrivewill result in Node 2’s statebeing 
changed to waiting. This tine, of course, the parent node does not forward the request 
upwards . The t hi r d ness age toarrivewill result ina change of Node 3’s s t at e to wai t i ng. 
Wen the value of AT s sent to Node 1, it will send that data to the parent as part of 
t he confir nival we ness age. The parent wi 11 t hen s end t he dat atoall of its subt r ees i n t he 
wai tingstate, inthis case Nodes 2 and 3. Asi ni 1 ar exanpl e of read conbi ni ng i s i 11 us t r at ed 
in H gure 2. 4. 
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Fi gure 2.5: Thi s is a s ketch of the write process. Node 5 want stowriteX, so it s ends 
messages to locate X Wen the locate messages reach the lowest conmm ancestor of 
Node 5 and all nodes t hat knowabout X(inthis case Node 1), lockness ages are s ent down 
to every node whi ch has X Each leaf node receivingalockmess age deletes its copy and 
sends an acknowledgment upwards. The owner (6) additionally sends a copy of Xto 5. 
Wen all of the acknowi edgnent s have been col 1 ect ed, Node 5 is sent a message. Node 5 
naynowupdate X 


2.1.5 Writing 

Agl obal wri t e i nvol ves findi ng all of t he copi es of a bl ock i n t he s ys t eiq 1 ocki ng t heiq 
deleting theiq transferring ownership and the current value to the newowner, and then 
perf orning the actual wri te. Thi s process i s shown i n Fi gure 2. 5. 

Several cons e cut i ve wri t e r eques t s f r oma si ngl e node to a parti cul ar 1 oc at i on can be 
fulfilled quickly and easily. As soon as a node has written to a block once, it has sole 
ownership and control over that block, and can thus perf ormconsecuti ve reads or writes 
locally until another node requests a copy. 

Anode wishingtowrite toablock whi chit does not have a wri t abl e copy of mis t first cre¬ 
ate an entry i n the state waiting _/ or _wr it e.novner_npl _nread, or nodi f y an exi sti ng entry to 
be in the waitingjf or _write_nowner _npl _yread or waiting_f or _write_y owner _npl state, as ap¬ 
propriate. The 1 ocat i on phas e then begins. The node sends a findJ owest -Comwn_for_write 
message to the node above it i n the hi erarchy. If that node has no record of the bl ock, i t 
sends the same message up. 

The locate phase continues until the lowest common ancestor (lea) of the block is 
reached. The calculation of the lowest common ancestor for a block considers all leaf 
nodes cont ai ni ng t he bl ock and t he node r eques t ing the write (see Fi gure 2.6). Not e t hat 
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Figure 2.6: The least comron ancestor for a bl ock depends not only upon the block, but 
al so upon where the wri ter is 1 ocated. The 1 ca f or bl ock y, cached i n INodes 5 and 6, from 
the point of viewof INodes 5, 6, or 2, is INode 2. ITomthe point of viewof all of the other 
nodes , t he 1 ca f or y i s INode 1. The 1 ca f or bl ock x, f r omany node’s poi nt of vi ew, is INode 
1. The defini ti on of the 1 ca for a bl ock fromthe perspecti ve of a node Ni s that the 1 ca i s 
t he fir s t node whos e ent r y i s t agged excl usi ve i n a pat h s t ar t i ng f r omTVgoi ng up t o t he 
node at the highest level of the hi erar chy. INodes 1 and 2 1 abel bl ock y as exclusive. INodes 
2 and 3 1 abel bl ock x as shared. INode 1 1 abel s bl ock x as excl usi ve. 


the lea node is the hi ghest node i n the hi erarchy that i s i nvol ved i n the wri te. An 1 ca node 
r ecei vi ng a findJ owest -CommnJor_wri t e ness age locks its entry, si gni f yi ng t he begi nni ng 
of t he 1 ock phas e and ensur i ng t he s er i al i zat i on of wri t es . It t hen s ends down l ock ness ages 
to every node whi ch has a copy of the bl ock. 

Ms t nodes r ecei vi ng t he l ock ness age wi 11 have copi es of t he bl ock bel d^ii-ocked. 
leaf nodes wi t h r ecor ds of the blocklocktheir entries, and f or war d down t he l ock ness ages 
t o al 1 t hose nodes bel owt hemwhi ch have t he bl ock. 

All of the leaf nodes with copies of the block will receive lock nessages. Those with 
purgeabl e copi es j ust erase the copi es and send an ack nessage up i nnedi atel y. The ol d 
owner of the bl ock wi 11 have an unpur geabl e entry. Thi s owner node first sends a copy of the 
bl ock di r ect 1 y to t he node r eques t ingthe write, thendeletes its copy and s ends up an ack 
nessage. The nai n purpose of the copy nessage i s to transfer the ownershi p of the bl ock 
and t o gi ve t he wri t er t he ol d val ue to di s t r i but e if ne cess ary to the reads serializedbefor 
t he wri t e. Thi s prevent s a deadl ock sit uat i on, whi ch wi 11 be described nor e f ul 1 y 1 at er. 

The node that is inthe state waiting _/ or _vrite will also have an unpur geabl e version 

1 If a node has deleted a block and the information that this has happened is still propagating upwards, 
son® nodes nay receive a lock nessage but not have a record of the block. In this case they innediately 
send up an acknowledgment of deletion. 
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Fi gure 2. 7: The fini te state nachi ne descri bi ng a 1 eaf node entry duri ng a wri te. These 
states are all approxi nate; the exact transi ti ons are descri bed i n Thbl e G 5. 


of the bl ock. Thi s is the node that requested the write. If two writes were requested at 
approxi natel y the sane ti ne, the one who the l ock nessage records as the wri ter is the 
one t hat won t he race. The ot her wri t e wi 11 be wai ting on a lockedentry s onewher e. The 
wi nni ng node sends up a special acknowl edgnent, ackl , i ndi cat i ng t hat it is on the path of 
t he wr i t e. 

The stream of ack and ackl ness ages signal the conbining phase of a write. This 
phase is used to ensure that every copy of the bl ock i s del eted before any nodi ficati ons 
are performed. Each parent node col 1 ect s ack and ackl ness ages unt il it receives res pons es 
f r omal 1 of i t s i nvol ved chi 1 dr en. If t he parent node is on t he di r ect pat hbetweenthe writing 
node and the lea node, i t sends up an ackl as soon as there i s onl y one subtree bel owi t wi th 
a copy, and i t has al ready recei ved an ackl froma subtree. The si ngl e renai ni ng subtree 
contai ns the node whi ch requested the wri te. If the parent node i s not on the wri te path, 
i t wai t s unt il it recei ves ack ness ages f r omal 1 subt r ees directly bel owi t whi ch had copi es , 
and thendeletes its recordof the block. The 1 ca node for t he bl ock wi 11 be on t he pat h 
for deletion. Wen i t receives the last acknowl edgnent, it sends a wri t e_ok nes s age down 
to t he node r eques t i ng t he wri t e and uni ocks its cache ent r y. The wri t e_ok ness age t ravel s 
through all nodes that were on the write path, uni ocki ng t hemas it descends. 

The node r eques t ingthe write will recei ve two final ness ages , i n an i ndet er ni nat e or der. 

Che is the s_write_own nessage, which contains the value of the data and perform the 
owner s hi p t r ansf er. The ot her is t he write.ok ness age, whi chi ndi cat es t hat al 1 ot her copi es 
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on the systemhave been del eted. Oil y after recei vi ng both nessages does the node change 
the state of the entry to writabl e and perf ormthe wri te. Fi gure 2. 7 shows the fini te state 
nachi ne representi ng thi s sequence. 

In t he pr ot ocol, any r ead i n pr ogres s when a write reaches a certain poi nt wi 11 conpl et e 
before thewrite does. In parti cul ar, when a l ock ness age reaches a 1 eaf node i n t he wait- 
i ng-forjread state thelockwill be del ayed at t he node unt i 1 a val ue i s act ual lysent there. 
After the read, the bl ock i s purged fromthe cache, and an acknowi edge i s sent. In order to 
avoi d deadl ock, t he pr ot ocol al ways al 1 ows at 1 eas t one node to di s t r i but e t he ol d val ue on 
denand. Before the owners hi p transfer, t he owner wi 11 have t he val ue f or di s t r i but i on; after 
t he owner s hi p t r ansf er the writer will have and di s t r i but e t he val ue. Beads at t enpt i ng t o 
conpl et e dur i ng t he 1 at er s t ages of a wri t e of t en end up bei ng s ent to t he wri t er. 

2.1.6 ASynchroni zati on H*i liti ve 

Al t hough t he read and writeprinitives whi ch ope rat e on shared neiror y ensure consi s t ency, 
t hey do not pr ovi de a si npl e net hod f or s ynchr oni zat i on bet ween nodes . W have t her ef or e 
included the test-and-set (tas) instruction. This prinitiveis included for conpl et enes s , 
and coul d be i npl enent ed bet t er by a var i et y of net hods [11] [ 20] . 

The TAS i s a conbi nati on of a read and a wri te. First a read i s perf orned, up to the 
point where a copy of the val ue i s located. If the copy i s non-zero, the TAS fails, and a 
t as _f ailed ness age is sent tot he r eques t i ng node (see Fi gure 2.8). If t he copy’s val ue i s 
zero, the write phase begins. The “write” continues just until the requesting node would 
be about t o perf or mt hewrite. Atthis poi nt, t he val ue i s agai n c he eked. If it is non- zero, 
no val ueiswritten. If it is zero, t he TAS conpl etes successfully. Thi s second check mis t b( 
per f or ned i n or der to ensure the atom cityof the test - and- set. Adi agramof a successful 
TAS isshowninFi gur e 2. 9. 

Al t hough t he test- and- set prinitive was desi gned wi t h bar r i er s ynchr oni zat i on i n ni nd, 
it is still not as good as a nechani s mspeci al 1 y desi gned f or bar r i er s ynchr oni zat i on. Synchro- 
ni zat i on usi ng t he pr ovi ded t es t - and- set prinitive does aprelini nary read before at t enpt i ng 
to gai n ownershi p of the test- and- set vari abl e i n order to reduce usel ess thrashi ng. Tb per- 
f or ma bar ri er synchr oni zati on, however, ever y node wi 11 still have to gai n ownershi p of the 
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tas failed 


Fi gure 2. 8: The di agramshows the steps of a test- and- set whi ch f ai 1 s in phase one. Node 
5 tries to per f or ma TAS on X Node 5 does not find XI ocal 1 y, and s ends a locate ness age 
up t o Node 2. Node 2 knows where a copy is, s o s ends a ness age down t o Node 4. Node 
4exanines X, and finds out that Xis non-zero, inplyingthat the test-and-set has failed. 
Node 4 t her ef or e s ends a ness age t o Node 5 t el 1 i ng i t t hat t he t as has f ai 1 ed. 



Figure 2.9: The diagramshows the steps of a t es t - and- s et whi ch conpl et es successfully. 
The first part i s the sane as i n Fi gure 2. 8 and i s not repeated here. After Node 4 veri fies 
that Xis 0, it begins the sane steps as woul d happen i n a wri t e. The 1 ca node for X 
(Node 1) i s f ound. Its ends 1 ock ness ages t o al 1 nodes whi ch have copi es of X Thos e nodes 
del et e t hei r copi es , and s end acknowl edgnent s upwar ds . Af t er bot h t he val ue and t he final 
acknowl edgnent are sent toNbde 5, it checks to nake sure Xis still 0. If so, it sets X 
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bl ock at s oixe poi nt. Cki a nachi ne such as t he J- Me hi ne, t he s e par at e ntss age f aci 1 i t y 
can be used by an appl i cati on t o bui Ida more effti ent barri er synchroni zati on. 

2.2 Physical Layout 

The hi er ar chy is napped t o a physi cal nachi ne in such a my as to r eal i ze hi er ar chi cal 
1 ocal i t y as physi cal 1 ocal i t y. The nappi ng i s al s o desi gned to spl i t t he addr es s space s o as 
to increase bandwi dth and prevent bottlenecks at higher levels of the tree. The napping 
is desi gned t o work f or all k-ary n-cubes , al though the protocol nay not performwell on 
configurations such as hi gh-di nensi onal cubes. 

Each processor stores part of the global address space. The locations of every block 
are stored i n a hi e rare hi cal di rectory, f orning the vi rtual tree describedinSection2.1. . 
vi r t ual t r ee i s conpos ed of vi r t ual nodes , each of whi ch nay be napped ont o s ever al physi cal 
pr oces s or s . Thi s nappi ng al 1 ows us t o f or ma di ffer ent physi cal tree t raver s al pat t er ns : one 
for each set of addresses. 

2.2.1 Hierarchical Drectory 

A di r ect or y r ecor ds whi ch nodes have copies of blocks. In a mil ti pi e 1 evel systeiq every 
parent node at level 1 knows whi ch of its chi Id nodes have copies of a bl ock. Every parent 
node above level 1 stores whi ch of its chi 1 d nodes are t he r oot s of subt r ees cont ai ni ng copi es 
of a bl ock at t hei r 1 eaves . Tb locate a block t hat i s not s t or ed 1 ocal 1 y, a node s ends an 
i nqui r y whi ch wi 11 t r avel upmr ds unt i 1 a copy i s f ound. 

In order to increase bandwidth, the directories of the virtual nodes at every level are 
spl i t ont o nany physi cal pr oces s or s . Thi s splittingis shown in li gur e 2.10. Each 1 eaf 
node i s napped di recti y onto a uni que physi cal processor. The parent (non-1 eaf ) nodes are 
di s t r i but ed equal 1 y ont o al 1 pr oces s or s of t he nachi ne, whi 1 e nai nt ai ni ng 1 ocal i t y. The t op 
node of t he tree is di s t r i but ed ont o al 1 nodes of t he nachi ne. The nappi ng i s al s o desi gned 
t o pr om>t elocality: every physi cal pr oces s or s t or es part of a node f r omever y 1 evel. Thi s 
i npl i es t hat s one r eques t s can t raver se the ent ire tree whi lest ayi nglocal toaprocessor. 

Fi gure 2. 11 shows a hi erarchi cal di rectory enbedded i nto a two- di nensi onal nesh net¬ 
work. The hi ghest 1 evel of a vi rtual tree consi st s of a si ngl e node. 11 s f our chi 1 dren are th 
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Fi gure 2.10: The vi r t ual addres s tree is split toincrease bandvi dt h. In t hi s case, a 3 1 eve 1 
r adi x 2 tree is napped ont o a 4- ary 1- cube. Each vi r t ual 1 eaf node is s t or ed on a uni que 
physi cal proces sor. The fir st -1 evel parent nodes are each spl i t ont o t wo physi cal proces sor s 
(f or ni ng sub- l i nes ). The second-level parent nodes (in this case the root node) are split 
ont o f our physi cal pr oces s or s (f or ni ng a sub-1 i ne of doubl e t he size of t he fir s t -1 evel ones ). I 
a k- ary 1- cube, t he parent of a 1 eaf node vi 11 be 1 ocat ed i n t he s ane t wo- pr oces s or sub-1 i ne 
as 1 eaf node itself. The grandparent of a 1 eaf node vi 11 be 1 ocat ed i n t he s an® f our - pr oces s or 
sub-1 i ne as t he 1 eaf node i t s el f. For every addi t i onal 1 evel i n t he r adi x tvo tree, t he nunber 
of processors needs to be doubled. 
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Figure 2.11: Aconceptual viewof a two, three, and four level tree. Each group at a level 
becomes a si ngl e node at the next hi ghest 1 evel. 


four 1 evel 2 nodes whi ch conpose that si ngl e node. The four chi 1 dren of a 1 evel 2 node are 
the f our 1 evel 1 nodes , and of a 1 evel 1 node are f our 1 evel 0 nodes . Level 0 nodes correspond 
to leaves of the tree, and are physical processors. Each virtual parent node can contain 
i nf or nat i on about any bl ock, yet each of the physical processors conposi ng a par ent node 
can onl y hoi d sone pr edet erni ned subset of t he bl ocks , hased on t he bl ock addr es ses . Thi s 
nappi ngresults in physi cal 1 ocal i t y, becaus e any mess ages t ravel i ng in t he hi er ar chy wi 11 
always s t ay wi t hi n sub-cubes . 

The hi er ar chi cal di r ect or y can al s o be vi ewed as consi s t i ng of mil tiple trees. As an 
exanpl e, consi der the nappi ng of a vi rtual 3 1 evel, radi x 4 tree to a physi cal 4- ary 2 cube 
shown i n Fi gur e 2.12. The col 1 ect i on of nodes t hat can s t or e a parti cul ar addr es s f or ns a 
conpl ete tree. In thi s exanpl e, si xteen di fferent trees are f orned, each rooted at a di fferent 
processor. Because the trees for different addresses are different, there is no hot11 eneck at 
the “top node” of the hierarchy. 
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Fi gure 2. 12: Fees enbedded i nt o a 2- di nensi onal gri d. Chi y tvro out of si xt een are shown. 


2.2.2 Mppi ng Eincti on 

The napping function is used to calculate the node nunber of the parent (or child) of 
a node, given an address, a level in the hierarchy, and the current node nunber. This 
parti cul ar nappi ng functi on onl y works for nachi nes whose radi ces are powers of two. 

A gl obal address consists of two parts. The nup part mist encode the infornation 
necessary for the nappi ng functi on to operate, such as a gl obal processor ID The key part 
is used to disti ngui sh among addr es s es wi t h i dent i cal nap part s , such as 1 ocal addr es s es 
on a si ngl e pr oces s or. There are no restricti ons as t o how t he nap and key part s nay be 
c onbi ne d t o for ma gl obal addr ess. 

Any node can store any bl ock at the 1 eaf 1 evel. Tb cal cul ate the parent for that bl ock, 
replace son* part of the current node nunber with the nap part. For exanple, on the 
J-Michine, whi ch has a three-di n*nsi onal n*sh network, take the low bits of the node 
nunber ’ s t hr ee coor di nat es and re pi ace these three bits withthe cor r espondi ng t hr ee bi t s 
f romthe gl obal address. This s t r at egy i npl i es that the highest level nodes will store onl 
bl ocks whos e nap part of t hei r addr es s es equal s t hei r node nunber. Fi gure 2. 13 illustrates 
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Level 2 

Level 1 

Level 0 

Fi gure 2.13: Thi s figure demons t r at es t he nappi ng f unct i on used f or a 3- di nensi onal mesh. 

The three node nunbers indicate whi ch nodes can store address H. Ffcoul d be stored on 
any 1 eaf (1 evel 0) node. Tb cal cul ate the 1 evel 1 node that He oul d be stored at repl ace the 
1 owt hr ee coor di nat e bi t s , one f r omeach di nensi on, wi t h t hei r cor r espondi ng val ue from H 
Tb cal cul at e t he 1 evel 2 node, repl ace t he next hi ghes t t hr ee coor di nat e bi t s , etc. 

t he J- Mchi ne nappi ng f unct i on. 

This nappi ng f unc t i on wi 11 keepnessages confined t o physi cal 1 y snal 1 areas whenever 
possible. Aness age bei ng s ent f r omt he 1 eaf 1 evel t o t he fir s t 1 evel wi 11 by defini t i on have a 
desti nati on sonewhere wi thi n the ei ght (more gene^lriqyle 2cube whi ch i ncl udes the 
s ender. As sum ng bi di r ect i onal 1 i nks , the farthest sue ha ness age word d need to t ravel i s 
t hr ee ( n) hops . Mire general 1 y, t he f ar t hes t a nes s age wi 11 have t o t ravel t o comruni cat e 
bet ween 1 evel s i and i + 1 i s AKiops. Ch average, as sum ng r andomdes t i nat i ons , the 
di st ance i s on^y2* hops . There wi 11 be more di s cus si on of thi s enbeddi ng i n Chapt er 6. 

2.3 Su mma r y 

This chapter described the operations of PHD PHDsupports cache coherent read, write, 
and t es t - and-s et operations. Bead requests are sati shed i n the snal lest subtree containing 
hot h t he r eques t er and a copy of t he r eques t ed bl ock; onl ythree sets of ness ages are s ent 
up or down that subtree. Wi t e request s are confined t o the subtree cont ai ni ng the 1 owest 
common ances t or of t he r eques t er and al 1 copi es of t he r eques t ed bl ock; f our sets of ness ages 
t raver s e t he hi er ar chy, t wo of whi chf an out t o al 1 nodes wi t h copi es . The test- and- set r eques t 
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i s i npl enent ed as an opt i ni zed conbi nat i on of read and wri t e r eques t s , and i npl enent s a 
t es t - and-1 es t - and-s et operation. 

Thi s chapt er al s o des cr i bed a nappi ng of PHDt o ar bi t r ar y k- ar y n- cubes . The nappi ng 
t r ansi at es hi er ar chi cal 1 ocal i t y i nt o physi cal 1 ocal i t y. The nappi ng al s o s t at i cal 1 y spr ea 
higher-level tree nodes ont o nany physi cal processors, in order to i ncrease bandwi dth and 
prevent bottlenecks at the top of the tree. 



Ch a p t e r 3 


Protocol Issues 


Mmy deci si ons mist be nade i n the desi gn of a coirpl ex system These deci si ons often 
involve tradeoffs bet ween space, tine, and conpl exi t y. This chapter discusses son® of the 
t r adeoffs t hat were nade i n t he desi gn of t he Pr ot ocol for H er ar chi cal El r ect or i es as veil 
as the consequences of these decisions. 

Sect i on 3. 1 exam nes t hos e t r adeoffs i nt ended to i ncr eas e t he par al 1 el i smin conpar i s on 
toother hierarchical protocols by i ncreasi ng the asynchrony. Secti on 3. 2 consi der s snail, 
easi1y changeable desi gn deci si ons t hat f urt her opt i nize t he perf ornance of t he prot ocol. 

3 . 1 Pa rallelism 

PHD ms desi gned t o reduce t he s er i al i zat i on of pr ot ocol act i ons by i nt r oduci ng par al 1 el i sm 
i n t he s at i sf act i on of r eques t s . Par al lelizing a pr obi eiq however, of t en nakes t he pr obi em 
more conpl ex. Mmy of the choices nade in the protocol design therefore significantly 
increased the conpl exi ty of the protocol. Wether there is a coim®nsurate decrease in 
1 atency i s an open i ssue to be studi ed. 

3.1.1 Bctra Taversal s of the Herarchy 

Extra traversal s of the hi erarchy provi de i nf ornati on to a protocol. Avoi di ng extra traver¬ 
sal s of the hi erarchy i ncreases both the state necessary to support a protocol and the com 
pi exi ty of a protocol. There are two speci fic cases of thi s tradeoff i n PHD one i n the read 
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DDM PHD NAI, DHP 


Fi gure 3. 1: Thi s figure coirpares the nunber of traversal s of the hi erarchy needed for a read 
request for the four different protocols: EDVf PHD, NAI, and DHP. The black node is 
perf orning a read request. The grey nodes have copi es of the bl ock bei ng requested. 

request nechani s iq and one i n t he as ynchr onous i nval i dat e nechani s m W br i efly coirpar e 
f our di ffer ent s ol ut i ons to t hi s t r adeoff, t hr ee of whi ch are part of exi s t i ng pr ot ocol s . 

DIM The DDMpr ot ocol [12] requires n»re traversal s of the hi erarchy than do any of 
t he ot her pr ot ocol s . Thi s r equi renent is re as onabl e gi ven t he as sunpt i ons of t hat pr oj ect: 
they propose to i npl enent thei r protocol on a bus- based systeiq where the hi erarchy i s 
fixe d i n har dirar e and cannot be ci r cunvent ed. Duri ng a read request, four traversals of 
t he hi erar chy occur (see Fi gure 3.1): firstup, to find a node whi ch knows where a copy i s, 
then down, to a node with a copy, then back up and down through the net work, updating 
t he i nt er i or di r ect or i es as t he read occur s . 

There is onl y parti al 1 y asynchronous i nval i dat i on i n t he DDMpr ot ocol. In order to 
di scard a bl ock, a node mist i ni ti ate a transacti on whi ch carri es the data. Thi s transacti on 
wi 11 cont i nue t o pr opagat e upwar ds unt i 1 at 1 eas t one ot her copy of t he bl ock i s f ound. Thi s 
syst emprevent s the prot ocol f romdel eti ng the 1 ast copy of a bl ock. The transacti on mist 
carry t he val ue wi t h i t, uni i ke i n t he ot her prot ocol s , i n or der t o be sure t hat t he val ue i s 
preserved. 

This protocol differentiates four read states in the hierarchy for a subtree: invalid, 
reading, answering, and val i d dat a. These states are updated as the read travel s twi ce up 
and down the hi erar chy, and pr ovi de f ul 1 i nfornati on to the protocol as to the exact stage 
of a read. The val i d dat a s t at e onl y i npl i es t hat t he dat a has been s ent i nt o t he subt r ee, 
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not that the data is still there. 

HD The PHDprotocol requires one fewer traversal of the hi erarchy for a read request 
than does the DDMprot ocol. PHD, as i 11 ustrat ed i n Fi gure 3.1, al so rout es the read request 
up and down through the hi erarchy, but then sends a copy of the val ue di recti y through the 
network to the node requesti ng the read, and then updates the hi erarchy by a confirnati on 
sent only upwards. 

PHD can al s o conpl et el y as ynchr onousl y di s car d bl ocks . Thi s f eat ur e all ows nos t nodes 
to qui ckl y di s car d cache entries whenever the caches are full. Uhl i ke the DDMpr ot ocol, 
t he val ue i s not carriedinthe discardness age. The use of a speci al owner node guar ant ees 
that all nodes will not si mil taneousl y di scard thei r copies. The owner, whi ch i s defined as 
the last node to write to a block, cannot as ynchr onousl y di s car d i t s copy. If a particular 
node i s t he onl y node to write to nany bl ocks , its cache wi 11 event ual 1 y fil 1 up. PHD can 
be extended to solve this pr obi emby addi ng a ness age whi ch request s son* other node 
wri t e t he val ue (freeingit f r omt he f ul 1 node’s cache). Thi s s ol ut i on i nt ro duces conpl i cat e 
1 oad-bal anci ng i s sues not addressed by thi s thesis. 

The conbi nati on of these two features i ntroduces conpl exi ty to the protocol. Al though 
i nval i dat i on i s now si npl er, becaus e t he val ue i s not carriedinthe deletionmess age, t he 
problemof the owner capaci t y over flow has been i ntroduced. The longest path for a read 
request is nowshorter than it was before, but the read-conbi ni ng path i s si i ght 1 y 1 onger. 
Beads whi ch have been conbi ned do not r ecei ve t he val ue of t he dat a unt i 1 after t he confir- 
nati on st ep of the read request. Aread request that has been read conbi ned mist, i n the 
worst case, wai t through two traversal s of the hi erarchy (one up and down), one message 
bei ng s ent acr os s , a confir nat i on bei ng s ent up t hr ough t he hi er ar chy, and t he dat a final 1 y 
being sent down t o t he conbi ned nodes . 

PHD al so di fferenti ates four read states i n the hi erarchy for a tree: invalid, reading, 
wai t i ng f or a read conbi nati on, and val i d dat a. The val i d dat a s t at e n*ans t hat t he subt r ee 
receivedthe dat a but nay have al ready deletedit. 

HP The DHPprotocol [ 21] bothrequi res the fewest nunber of traversal s of the hi erarchy 
and i npl enent s asynchronous invalidation. Tbgether this conbi nati on resul t s inaprotocol 
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vulnerable to deadlock, because not enough i nf ornati on i n the hierarchy is available to 
tell whet her or not a subt r ee i s wai ting to receive a block, has al ready receivedthe block, 
or has received and already deleted (but not yet propagated this infornation upwards) 
the block. This 1 ack of i nf or nat i on i s used as the read conbi ni ng nechani sip if a read 
request reaches a node whi chis intheprocess of r eadi ng dat a, it wai t s t her e unt i 1 t he dat a 
arrives. This nechani s mby i t s el f is perf ectl y reasonabl e. Unfortunately, when conbi ned 
wi th asynchronous invalidation, the nechani smresul t s in deadlock, where two nodes can 
each end up wai ti ng f or the sane bl ock, and each al so be prom si ng t o t el 1 the other when 
theyreceivetheblock. Inthis case ne it her request can ever be hi 1 ed. 

The read conbi ni ng nechani smof t hi s pr ot ocol t he or et i cal 1 y reduces the pat h 1 engt h of 
the 1 ongest conbi ned read. The onl y probl emwhi ch can occur i s read chai ns , where a 1 i st 
of nodes is wai t i ng f or a bl ock. The val ues wi 11 pr opagat e one at a t i ne; upon r ecei vi ng t he 
val ue i t has been wai t i ng f or, a node f or war ds t hat val ue to its own list of wai t i ng nodes . 
Thos e nodes i n t ur n ni ght t hens el ves have lists of ot her nodes wai t i ng f or t he s ane val ue. 

The DHP can onl y di ffer ent i at e t wo s t at es for subt r ees i n t he hi er ar chy: i nval i d and 
valid, where validinplies that the subtree will receive the data, has receivedthe data, or 
had r ecei ved (and al ready del et ed) the data. It cannot use any other states because every 
node i s onl y vi si t ed once. 

NS Afourth protocol, not yet proposed, is identical to the DHP except that it eliiri- 
nates the asynchronous i nval i dat i on abi 1 i t y. W call this protocol NAI (No Asynchronous 
I nval i dat i on). Bead r eques t s still t ake onl y t wo t raver s al s of t he hi er ar chy. There are s t i 1 
onl y t wo s t at es for subtrees in the hierarchy but the neanings have changed: nowthe hi- 
erar chy keeps track of whether a subtree is i nval i d or has or wi 11 get a particular block. 
Thi s pr ot ocol el i ni nat es t he deadl ock sit uat i on of t he DHP by guar ant eei ng t hat a “val i d” 
subt r ee ei t her has a copy of t he bl ock or has an out s t andi ng read r eques t whi chis bei ng 
s at i s fie d out s i de of t hat s ubt r e e. 

The disadvantage of this protocol is that it requires that nodes reserve enough roomin 
t hei r caches for t he bl ocks t hat t hey del et e bet ween t he tin® t hat t he del et i on i s i ni t i at ed 
and the tin® that they recei ve an acknowl edgnent i ndi cati ng that it is safe to perf ormthe 
del et i on. 
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Figure 3.2: This figure shows howthe di s t r i but ed wri t e comfit works. INode lis in the 
process of requesting a write. The grey nodes have copies of the object. Consider what 
happens if INodes 2-8 all r eques t ed read ope rat i ons at t hi s poi nt, and the lock ness ages al 1 
froze i n the network, so that the reads woul d have ti me to conpl ete. INodes 4 and 6 woul d 
conpl ete their read r eques t s , becaus e t hey have 1 ocal 1 y cached copi es . The r eques t s from 
INodes 3, 7, and 8 woul dstall onthe write, becaus e t hey woul d r each a 1 ocked node before 
reaching a valid node. Node 5, onthe other hand, woul d be able to conpl ete its read, 
becaus e a val i d node above it is still uni ocked. Node 2 woul d be abl e t o conpl ete its read, 
beeause its request i s coni ng f romthe wri ti ng subtree. 


3.1.2 Dstributed Wte Commit Ibint 

The comfi t poi nt for awriteis distri but ed i n PHD A wri t e wai t s unt i 1 al 1 reads in progress 
fini sh. Af t er a parti cul ar t i me, newi y s t ar t i ng reads will be stalled unt i 1 t he conpl et i on of 
a wri te; the cal cul ati on of thi s ti me i s di stri buted. As soon as a 1 ock message reaches a 
node, no read or i gi nat i ng f r omany i nval i d subt r ee bel owi t wi 11 be s at i sfied unt i 1 after t he 
wri te. The one excepti on to thi s rul e i s that reads coning fromnodes i n currentl y wri ti ng 
subt r ees wi 11 be all owed t o conpl ete as well. An exanpl e of t he var i ous cases of reads and 
writes interactingtof or mt he comfi t poi nt i s shown i n Fi gur e 3.2. Thi s s chene support s 
sequential consi stency [ 14] , but also adds conpl exi ty to the protocol. 

The DHPprotocol, onthe other hand, does not nake any guar ant e e s about reads naki ng 
pr ogres s . Itispossiblet hat a read in t he DHP can be i ndefini t el y del ayed by a series of 
wri t e r eques t s coni ng f r omot her nodes ; t he re ad wi 11 spend al 1 of its t i ne s ear chi ng t he 
nachi ne f or a val i d copy. 
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3 . 2 De sign De c i si o n s 

Several desi gn deci si ons whi ch were nade i nthe constructi on of the Protocol f or H erarchi cal 
El rectori es coul d be easi 1 y vari ed. These deci si ons i nvol ve the read conbi ni ng nechani sip 
t he wr i t e i nval i dat i on nechani s iq and t he i nval i dat i on ne chani s m 

3.2.1 Rad Gain ri ng 

The desi gn of the read conbi ni ng nechani smis another exanpl e of a ti ne- space tradeoff. 

The read conbining of PHD del ays read requests fromlater requesting nodes until the 
requests fromearlier requesti ng nodes have been ans were d. This del ay prevent s the later 
requests f romsendi ng another set of 1 ocat i on and dat a t r ansf er nes s ages . Another my to 
i npl enent read conbi ni ng i s to use address-speci fic del ayi ng queues which are exanined 
whenever a bl ock’s di r ect or y s t at e on a node changes . Thi s s t r at egy s aves s one bi t s i n t he 
state of each node, because aslot tost ore the wai ti ng status of every subtree for every bl ock 
i s no 1 onger needed. 

Usi ng these queues provi des two nethods for deci di ng what to do when a 1 ock nessage 
reaches a di r ect or y node whi ch has chi 1 dr en wai t i ng f or t he val ue. The first is to send t he 
lock ness age t o al 1 wai t i ng chi 1 d subt r ees , as i n t he current ver si on of PHD Uhl i ke PHD, 
however, this s chene r equi r es that a lock nessage always check the del ayi ng queue before 
cont i nui ng, i n or der to find out whi ch subt r ees need copi es of t he val ue. The second s chene 
is to not lock waiting subtrees; the reads coning fromthose subtrees are considered to 
happen after the writes. 

Another i nt er es t i ng ques t i on t o consi der about read conbi ni ng i s whether it is worth¬ 
while at all. Wthout read conbining the protocol becones substantially sinpler. It is 
not clear howof t en nodes request the sane value nearl y si mil t aneousl y, except for special 
synchroni zati on vari abl es ; these coul d be handl ed separatel y. 

3.2.2 Rad Ghnfai ri ng never Rssi H e 

The read conbi ni ng nechani smof PHDconbines t wo r ead r eques t s whenever they occur 
near 1 y si mil t aneousl y. Wen a read reaches a node wi th a subtree that i s al ready readi ng 
that block, and that node has no subtrees whi ch defini t el y have copies of the block, read 
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Fi gure 3.3: Thi s figure shows the twopossibilities for read conbi ni ng. In hot h net hods , 
INode 1 request a read operati on. Si nee i t has no copy of i t, the request i s sent up to the first 
node t hat has it, i n t hi s case t he root node. Now INodes 2, 4 and 12 issue read r eques t s t o 
the sane bl ock. Node 2’s request wai t s at its parent, as expl ai ned i n Qiapt er 2. Si ni 1 ar 1 y, 
Node 4’s request waits at i t s grandparent. The issue is what happens to the request from 
Node 12. The r eques t coul d be s ent down t he pat htoNbde 6, likeNbde l’s r eques t was , 
or the request coul d be conbi ned. 


conbi nat i on occur s . The quest i on i s what todointhe sit uat i on, shown i n H gure 3.3, where 
a read request propagates up to a node that has both a subtree in the ni ddl e of a read 
and a subtree wi th a defini te copy of the bl ock. If a newread request i s sent down to the 
subt r ee wi t h a copy, one - fif t h of t he pr ot ocol t abl e ( shown i n Thbl e Gil) will be nol onger 
reachable and can be elininated, because the conbi ned vect or state vX_wX_c X ( s ubt r e e s 
nay be invalid, valid, waiting, or confirned) can no longer occur. The other possibility 
for i npl enent at i on i s that the read request be conbined, and thus forced to wait until 
the first r ead conpl et es, when i t will be sent its value. Both versions of the protocol have 
been si mil at ed. The s chene whi ch conbi nes read r eques t s per f or ned bet t er on t he s t udi ed 
syntheti c traces and is currentlyi npl enentedinthe system 

3.2.3 Wte Invalidate versus Wte Update 

Anot her interestingissue is whet her t o i nval i dat e a bl ock or to updat e it withthe newval ue 
when a re not e write occur s. PHD coul d be nodi fied t o us e an updat e s chene, i ns t ead of 
an i nval i dat e one. For per for nance, a nechani smt o per i odi cal 1 y r enave unus ed copi es of 
blocks woul d be r equi r ed. Wthout this capability, writes woul d i nvol ve al 1 nodes who had 
ever read the bl ock and whos e caches had not subsequent! y chosen to di scard the block. 
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Wi t e updat e coul dbe ext ren*ly val uabl e i n s oixe sit uat i ons , such as where a f ew nodes 
were constantl y shari ng data. 

3.2.4 Nn> 1 eaf Inval i dati on 

The protocol does not current 1 y address the i ssue of a ful 1 cache i n a non-1 eaf node. Wen 
non-leaf nodes fill up, we cannot sinply discard the non-leaf values. Scott and Gbodnan 
have addressed this problemin [24]. Their solution is pruning caches , which could be 
adapted to work i n PHDl In thei r pruni ng cache scheme, non-1 eaf di rectory entri es nay be 
di s car ded when a cache i s f ul 1. Pruni ng caches st ore i nf ornati on about where a bl ock does 
not resi de rather than where i t does resi de. In thei r protocol, i f a 1 ock message ever reache 
an i nval id entry located in a node whos e parent has a val identry, lock ness ages mis t be 
broadcast to all children. Scott and Gbodnan have det er ni ned t hat “pruning caches with 
a nodest hit rate si gni ficant 1 y reduce t he i nval i dat i on t r afifc. ” They al s o f ound, in t hei r 
si mil at i on s t udi es , t hat when a cache fil 1 ed up and t hey needed t o di s car d an ent ry, “it is 
better to suffer i ncr eas ed i nval i dat i on t r afifc when the line is writtent han t o pr enat ur el y 
i nval i dat e t he 1 i ne.” 

3.3 S u mma r y 

Thi s chapter exam ned son* of the deci si ons nade i n the desi gn of the Protocol f or H er- 
ar chi cal El rect ori es . Son* of these deci si ons are easi 1 y changeabl e i npl ensntati on i s sues 
Others , such as the nunber of traversal s that shoul d be nade of the hi erar chy, expose naj or 
differences bet ween PHD and other protocols. 

Al though the ninor deci si ons coul d be easi 1 y i sol ated and tested to deternine whi ch 
perform better, the naj or ones cannot be t es t ed i n i s ol at i on, as they are not necessari 1 y 
s e par abl e f r omeach ot her. In or der t o det er ni ne t he benefit s of t hes e naj or deci si ons , we 
mist conpare the perf ornance of PHDand the other hi erarchi cal cache coherence protocol s 
f or a var i et y of benchnar ks . 



Ch a p t e r 4 


Si mul at or 


W wrote a si mil ator to model the operati on of the protocol runni ng on a conputer, such 
as the J-Mchine, wi t h a k- ar y n- cube net work t opol ogy. The si mil ator cur r ent 1 y model s 
nachines of 64 nodes with two or three dimensions. The simulator is trace-dri ven, taking 
as i nput astaticallys chedul ed 1 i s t of memory references and si mil at i ng t hemby f ol 1 owi ng 
t he pr ot ocol. It out put s a 1 og hi e det ai 1 i ng t he steps it took. W also mote a ver i heat i on 
pr ogr amwhi ch t akes t he out put of t he si mil at or and ver i fie s t hat it f ol 1 ows a 1 egal or der i ng 
of events. 

4 . 1 Ov e r v i e w 

The simulator serves two purposes: first, it tests the protocol and second, it provides a 
pi atformf or studyi ng several characteri sti cs of protocol behavior. In parti cul ar, itprovic 
a method to examine the number of messages sent per operati on, the 1 ongest path travel ed 
per ope rat i on, and t he average hei ght inthe tree reachedper ope rat i on f or di ffer ent t ypes of 
ope rat i ons . The results of this st udy ar e i n Qiapt er 5. The si mil at or ms not desi gned t o 
support an analysis of howt he protocol behaves when burdened by net work constrai nts and 
di ffer ent costs for di ffer ent act i vi t i es . An anal ysis of these issues is locatedin Chapt er 6. 

The si mil ator operates at the message 1 evel; one uni t of si mil atedtime is the time a 
mess age t akes t o t ravel one hop between two adj acent nodes . The t i me a mess age t akes 
t o t ravel f r omnode Ato node Bis t he ref ore equi val ent t o t he di s t ance i n hops bet ween 


53 



54 


CHAP TER 4. SI MUL AT OR 


nodes AandR Iterations whi ch can be sati sfied 1 ocal 1 y, such as a local read, write, or 
t es t - and-s et, occur i ns t ant aneousl y on a si ngl e node. Mssage processi ng for a node, onthe 
other hand, takes a constant amount of time, 10 hops, during which the node is busy and 
can process nonewevents. 

Tine and event sequencing is represented by an event-driven queue. The queue rep¬ 
resents a range of time. Each slot in the queue corresponds to one particular time, and 
cont ai ns a 1 i s t of event s t o occur at t hat parti cul ar t i me. There i s one gl obal queue for t he 
s i mil at i on pi us one local queue per processor. 

Events are removed fromthe queues and proces sed accor di ng t o thei r type: nessages 
or operations. The simulator supports local allocation, read, write, and t es t - and- s et oper 
ations. It addi t i onal 1 y s uppor t s all of the types of nessages speci fied by the protocol. The 
gl obal event queue al so support s pri nti ng event s , cache - enptyi ng event s , warm st art event s , 
and nem>r y- dunpi ng event s . 

In addi tion to listing every oper at i on as it conpl etes, the si mil at or can be configured 
to pr i nt any of t he f ol 1 owi ng 1 og i nf or nat i on: event s pr oces s ed (as t hey occur i n t he event 
queue), messages processed, and nessages sent. The simulator can be configur ed t o pri nt 
out nany di ffer ent types of statistics about the protocol and i t s operation. 

4 . 2 Da ta Layout 

The “global” neirory of the systemis scattered throughout the nodes. Each node has a 
section of its memory devoted to stori ng the data blocks that it has copies of, or knows 
about. It also has a s ect i on whi ch cont ai ns nappi ngs f r oman {addr es s , 1 evel } pai r t o a 
poi nter to the data block stored in the data secti on. 

There are t wo t ypes of ent r i es whi ch ni ght be poi nt ed t o f r omt he dat a- nappi ng t abl e. 

The first type is a leaf entry. It represents an actual bl ock of data, and corresponds to a 
bl ock of neirory whi ch woul d be found i n a node 1 ocated at the bottomof the hi erarchy. 

The second type is a parent entry. Aparent entry stores i nf or nat i on about whi ch subtrees 
have copi es of parti cul ar bl ocks . These entri es correspond t o neirory whi ch woul d be f ound 
on a node of the hi erarchy not 1 ocated at the 1 eaf 1 evel. 

Al eaf cache ent r y, shown i n Thbl e 4. 1, t akes up 7Vf2 wor ds , where Ms the line size. 
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Thbl e 4. 1: The part s of a 1 eaf cache entry. 
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Thbl e 4.2: The parts of a parent cache entry. 


The first ward contai ns the state of the entry, as descri bed i n Chapter 2. It al so contai ns 
t he i nf or nat i on, dur i ng wri t es , of whi ch wor d i n t he cache 1 i ne is bei ng wri 11 en. The second 
ward encodes the global address of the object and the level. The final IVwords are the 
val ue of t he bl ock s t ar t i ng at t he addr es s . Becaus e onl y t he addr es s r epr es ent i ng t he s t ar t 
of the bl ock i s inportant, this i npl enent at i on hi des the level in those redundant bottom 
addr es s bi t s . 

Aparent entry is any entry whi ch does not correspond t o a 1 eaf of the hi erar chy. It is 
al vays conposed of exact 1y t wo words . The first wor d cont ai ns t he bi t s speci fied by the 
pr ot ocol, as veil as a vect or of up t o si xt een bi t s i ndi cat i ng whi ch subt r ees have copi es 
of that object. The vector contains two bits per subtree. The writer field, whi ch i s used 
onl y duri ng wri tes by nodes 1 ocated on the path between the wri te requester and the 1 owest 
connon ances t or of al 1 copi es, storesthei ndex of what subt r ee isperforningthewrite, so 
t hat 1 os t read r eques t s can be r out ed tot he writer, as describedin Chapt er 2. A32-bit 
ent r y can act ual ly store up to 12 subt r ees , al t hough onl y t he capabi lityfor ei ght is i n t he 
current sinulator version. The second word encodes the global address of the object and 
i t s 1 evel, exactl y as i n the 1 eaf entry. 

The si mil at or nanages t he si mil at ed heap by usi ng a neiror y al 1 ocat i on nanager. The 
nanager prevent s nenory f rombei ng f ragnent ed by rearrangi ng nenary whenever a bl ock 
is freed, and t hr owi ng away pur geabl e bl ocks when neces s ar y. The si mil at or al s o pr ovi des 
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a nEthod for removi ng bl ocks fromthe cache on denand fromthe i nput, i n order to test 
the protocol. 

4.3 The Si mu 1 a t o r 

The nain parts of the si mil at or are the node model, the net work model, and the event - 
dr i ven queues . 

4.3.1 N>de Mdel 

Oil y the state of a node essenti al to the operati on of the si mil at or is model ed. Every node 
has a nodeid, a local event queue, a set of del ayi ng queues , and ne nary. As s oci at ed wi t h 
t he nemor y i s a t abl e whi ch s t or es nappi ngs bet wen {addr es s , 1 evel }pai r s and poi nt er s 
i nto the memory. In an i npl enentati on on an actual nachi ne, the tabl e woul d be part of 
the memory. 

Because the processi ng of a nessage i n a node occurs i n a si ngl e ti ne- step, part of the 
state of anode stores whether or not anode is busy and, if so, for howlong. A special 
event, node_done, is added to the local event queue at the time when a node should finish 
processingthe current mess age. Anode wi 11 per f or mno act i ons i n t he meant i me. 

Each node al s o s t or es a speci al as s oci at i on 1 i s t r ecor di ng t he val ue to be writtenfor 
any ongoi ng wr i t e. Thi s i nf or nat i on woul d nor rial lybe stored directly in t he instruction 
stream 

Ideally, each node can timeshare among several different processes. Wen a process 
nakes a global memory reference which is not locally satisfiable, it suspends while the 
reference is filled. Inthe neant in®, other processes can run. 

The simulator models this ability by allowing nglobal requests per processor to be 
occurri ng si mil taneousl y, where nis an executi on- ti n® parameter. Qoerations whi ch have 
al ready been read in f r omt he i nput trace are placed in a reference queue whi ch i s part of 
the node model, but is disjoint fromthe event queues. Wenever an operati on on a node 
completes, the queue for that node is checked. If the next operati on on that queue is due to 
happen inthefuture, it is s chedul ed. If t he next ope rat i on was suppos ed t o happen al ready, 
i t i s started. If there i s no wai ti ng operati on, the parser i s i nvoked to read more i nput. 
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4.3.2 ]>4tvork Mdel 

The network of the J-Mchine is a three-di nensi onal nesh. The sinulator m>dels this 
network, or opt i onal 1 y a t wo-di nensi onal nesh, i n a conpl et el y uni oaded condi t i on, i.e. 
under a zero congest i on si t uat i on. Mss age del i very t akes t i me pr opor t i onal t o t he di s t ance 
bet ween t he nodes al ong a Mnhat t an r out e: first t he X di r ect i on i s f ol 1 owed, then the Y, 
then the Z. Anessage is “sent” at the end of the period of tine corresponding to any 
processi ng that a node i s doing. 

W have chosen par aneters such that each hop i n t he net work t akes one-tenth of the 
tine to process a ness age. Thi s no del s a systemwi th bal ance between conputati on and 
c oimuni c at i on f or fine-gr ai ned pr oces si ng. The longest nessage sent is 4+iVwords (Ni s 
thelinesize); the shor t es t 2 wor ds . Al 1 nes s ages are appr oxi nat ed as bei ng t he s ane 1 engt h 
for purposes of arri val tine. If the desti nati on of a nessage i s the node that generated the 
nessage, the transni ssi on i s suppressed and the conput ati on conti nues innediately. 

4.3.3 Brent- Di ven Queues 

'll ne i s i npl enent ed as a ci r cul ar 1 i s t of queues . At t he s t art of a si mil at i on, t he si mil at or 
reads a bl ock of the input and schedules the speci fied event s. It places each event in the 
queue entry representi ng the appropriate tine, creati ng entri es as needed. It then begins 
processing the queue. The similator processes input on a node by node basis when any 
node runs out of operations toperforiq as nor e f ul 1 y des cr i bed i n Sect i on 4. 3. 1. Wen 
there are no nore event sin any of the queues , the si mil ator hal t s . 

Global Queue The global event-driven queue keeps track of the events that are to be 
act i vat ed dur i ng each tine slice. There are several different types of events. Iterations 
are speci fied by the i nput fil e, and i ncl ude REA D, WR IT E , T A s , and A LL oc. As nenti oned 
earlier, therearealso var i ous t ypes of debuggi ng event s , not necessarilyspecific to parti cul 
nodes, whi ch can be specified. 

local Qieue Local event-driven queues are used, one per node, to keep track of node 
speci fic event s , such as ness ages , and gl obal event s t hat be cone 1 ocal. Mss ages are gener- 
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at ed i n res pons e toother ness ages or ope rat i ons . These queues correspond nest closelyto 
the nes sage queues that woul d be f ound on sone nachi nes . 

Inlaying Qieue Each node addi t i onal 1 y has del ayi ng queues . Events whi ch cannot be 

sati shed unti 1 another event occurs are pi aced on an address-sped fie del ayi ng queue, and 

woken up onl y when an event referringtot hat addr es s occur s . 

4 . 4 Ve ri ficati on 

The si mil at or was t es t ed by r unni ng a hand- writtenset of tests, desi gned toexercise all of 
the features of the protocol, as well as nany sets of nachi ne-generated syntheti c address 
stream. The si mil at or al s o cont ai ns s el f - consi s t ency code, ensuring that an error is sig- 
nalledif state andnessage conbi nations whi chare illegal occur. Averifier was writtenin 
or der t o hel p ver i f y t he si mil at or. 

4.4.1 Veri fier 

W wrote a veri ficati on programwhi ch takes the output of the si mil ator and ensures that 
t he out put s equence of reads , wr i t es , and test- and- sets is al egal or der i ng of t he r eques t ed 
events. The ver i ficat i on of the verifier was done for a large set of hand-crafted test cases. 
The rules that the verifier obeys are as follows: 

• Any r ead t hat fini shes before awrite ope rat ionstarts will see the old val ue. 

• After a write operation fini shes , t he val ue changes , and any r ead t hat s t ar t s gets t he 
ne w val ue. 

• Any read that starts before awrite operati on fini shes and finishes after the write 
oper at ion s t ar t s nay see t he ol d or new val ue. 

• A t es t - and- set nay onl y conpl ete successfullyif the val ue of t he dat ais zero at the 


point when the set would occur. 
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4.4.2 Internal Checking 

The si mil at or is peppered withasserti ons whi ch c heck for ill egal s t at es and conbi nat i ons . 
The si mil at or also has an opt i on al 1 owi ng one to choose naxi numbounds on a random 
interval for ejecting values fromthe cache. Wen this interval is set to one, values are 
ej ect ed f r omt he cache one t i ne uni t after t hey are pi aced t her e, al 1 owi ng f or at hor ough 
testing of the protocol. 

4.5 S u mma r y 

This chapter describes the si mil at or used to experinent with PHDl The si mil at or im 
pi clients the full protocol plus cer t ai n ext ensi ons , such as local al 1 ocat i on and opt i onal 
aut onat i c al 1 ocat i on on uni ni t i al i zed dat a. The similator is trace-dri ven, and can gather 
nany types of stati sti cs for studyi ng the protocol. 

The similator has been used to test the protocol; additional features for debugging 
i ncl ude pri nt i ng event s and cache - enpt yi ng event s . Aspeci al veri ficat i on pr ogr amiras al s o 
desi gned to ensure that the protocol keeps the nenary consi stent. 

The similator runs 3,999,800 cycles in just under t to hours. This represents 4096 
al 1 ocat i on r eques t s, 384,417 read r eques t s , and 255,583 write r eques t s, all resultingin 
total of 1,496,929 ness ages bei ng s ent. Each node was al 1 ocat ed 0x3000 wor ds of nenar y 
for this similation. This ti ne measurement took place on an uni oaded Spar c II with 32 
megabytes of DRAVJ acces si ng onl ylocallystoredfiles, and was t ypi cal of howt he si mil at or 
was actually run. 



Ch a p t e r 5 


Abstract Analysis 


This chapter presents an abstract model of the Protocol for Hierarchical Directories and 
t hen us es t he model t o showt he effect s of 1 ocal i t y and nachi ne size on s ever al char act eri sties 
of t he pr ot ocol. The model is shown t o be val i d usi ng r esul t s gener at ed by t he si mil at or 
descri bed in Chapter 4. The model is used to study the average hei ght per operati on, the 
1 onges t pat h of mess ages t ravel ed per oper at i on, and t he nunber of mss ages gener at ed per 
ope rat i on f or nachi ne configur at i ons too large to si mil at e. The r esul t s f r omt hi s chapt er 
become the inputs of an enbedded model, des cri bed i n Chapt er 6, whi ch addresses the 
protocol behavi or when i t i s napped onto a speci fic archi tecture. 

Sect i on 5. 1 des cri bes t he model and a newmet hod of r epr es ent i ng t he amount of 1 ocal i t y 
i n an appl i cat i on. Sect i ons 5. 2 t hr ough 5.5 di s cus s t he appl i cat i ons and met hods us ed to 
val i dat e t he model. Sect i on 5. 6 pres ent s t he r esul t s of t he s t udy, showi ng t he i npor t ance of 
1 ocal i ty as nachi nes i ncrease i n si ze. An al phabeti cal 1 i sti ng of all of the vari abl es define 
inthis t he sis can be f ound i n Thbl e A 1. 

5 . 1 Md del i n g Hi erarchical Behavior 

Before measuring the appl i cat i on-dependent behavior of the protocol, the protocol charac¬ 
teristics to be measured, the appl i cat i on char act eri sties needed t o measure t hes e aspect s of 
the protocol, and a model of the protocol behavior mist be defined. 


60 


5.1. MODEL I NG HI ERARCHI CAL BEHAVI OR 


61 


5.1.1 Gtervi ew 

Since we are pri nari 1 y i nt eres t ed i n under s t andi ng how a hi erar chi cal protocol scales as 
nachine size and 1 ocal i ty change, we have studied three appl i cat i on char act er i s t i cs : the 
average hei ght inthe tree are ad or write ope rat i on reaches , t he 1 engt hinnsss age hops of 
t he “1 onges t ” pat h t raver sedinorder tosatisfyareador write ope rat i on, and t he nunber 
of nessages generated per read or write operation. 

Because thi s chapter does not study the protocol as napped onto a parti cul ar archi tec- 
ture, issues such as whether or not the cal cul at ed nes s age-gener at i on r at e can be sustained 
due t o bandwi dt h consi der at i ons are not consi der ed. Si ni 1 ar 1 y, we ar e al s o as sum ng i nfini t e 
caches , si nee fini t e cache effect s conpl i cat e t he nadel, cl oudi ng t he i npor t ant mil tiproces- 
s or i s sues under consi der at i on. fl ni t e cache nadel i ng can al ways be f act or ed i n 1 at er [ 1] . 

5.1.2 local i ty Characteri sti cs 

In order to study the behavior of a cache coherence protocol which is highly dependent 
on 1 ocal i t y, we mis t have s one net hod of expr es si ng t he 1 ocal i t y pr es ent in appl i cat i ons . 

W pr opos e arepresentationof localitytailoredto st udyi ng hi erar chi cal cache coherence 
prot ocohs 

Shared dat a ope rat i ons are al ways caus ed by node r eques t s . Ins t e ad of choosi ng a node 
t o nake a r eques t and f ol 1 owi ng t hat r eques t up t he hi erar chy, as i n t he act ual pr ot ocol, 
the abstract irodel chooses which class of nodes a request occurs in. The actual node 
t hat nakes t he r eques t i s uni npor t ant, al 1 t hat nat t er s is what cl as s t hat node is i n wi t h 
respect to what ot her nodes have copi esoftheblock. All nodes inacl ass have equal - hei ght 
1 owes t val ? dnees t or s . Choos e t he node to nake a r eques t as f ol 1 ows (see E gur e 5. 1 f or an 
i 11 ustrati on): st art at the root node of a di rect ory tree, and choose f romone of t wo groups : 
the i nval i d and t he val i d subtrees of the root. If the invalid class is chosen, the process 
s t ops . I f t he val id cl ass is chos en we agai n choos e f r omt wo gr oups : t he i nval i d and t he 
val i d subt r ees of t he val i d chi 1 dr en of t he root. Thi s pr oces s cont i nues unt i 1 an i nval i d cl as 

x Ki rk Johnson greatly assistedinthe devel opment of t hi s 1 ocal i t y model . 

2 Because all requests are model ed as occurri ng i nst ant aneousl y, we consider onl y t wo st at es: val i d and 
i nval id. Val i d i npl i es t hat t her e is a copy i n t he s ubt r ee; i nval i d i npl i es t hat t her e is not. 
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Fi gure 5.1: Thi s di agr ami 11 us t rat es t he s el ect i on of node cl as s es per f or me d by t he no del. 
The grey nodes are valid. The sel ecti on process starts at the root node, where either the 
group of subtrees who are val i d or are i nval i d are chosen. In the 1 ef t si de of the figure, the 
val i d group is chos en. Becaus e t he val i d cl as s was chos en, at t he next 1 evel anot her selection 
mis t be nade. At this selecti on, t he i nval i d group is chos en. Thi s naans t hat t he node to 
nake the next request wi 11 be i n the cl as s of nodes who are not val i d, but whose parent s are 
val id. In t he r i ght si de of t he figure, the selecti on process agai n begi ns at t he r oot, where 
t he i nval i d group is chos en. Thi s ends t he s el ect i on pr oces s ; t he next r eques t wi 11 be nade 
by a node who i s i nval i d and whos e par ent i s i nval i d but whos e par ent ’ s par ent i s val i d. 


is chosen, or until the leaf is reached. If an i nval i d cl as s is chosen, all nodes belowtha 
cl as s are i n the group of nodes that wi 11 nake the next request. If a val i d path down t o the 
1 eaves i s chosen, al 1 1 eaf nodes whi ch are val i d are i n the group of nodes that wi 11 nake the 
next request. 

W cal cul at e t he pr obabi 1 i t y of choosi ng t he val i d cl as s as f ol lqwfchecOedmBlpt y 
par amet er of 1 evel l, as t he a pr i or i pr obabi 1 i t y t hat t he choi ce wi 11 be t he val i d group when 
1 ooki ng down froml evel l. Thes e 1 ocal i t y par anet er s can be different at each level of the 
tree. If the request did not a priori cone froman already valid subtree, we distribute 
the probabi 1 i ty of where it cane f r omuni f or id y over all of the children. The locality in 
an appl i cat i on i s thus expressed by thi s set of 1 ocal i t y par anet er s . For exanple, i n an 
appl i cat i on where blocks were accessed uni f or id y by all pr Qdecsrs all ,1 picml d be 0. 

This set of 1 ocal i t y par anet er s lets us describe an appl i cat i on’s datausage. For greater 
accuracy, i ns t ead of consi der i ng an average bl ock, we coul d consi der s ever al cl as s es of bl ocks 
with their own sets qf p 

5.1.3 Mdel 

The per f or nance model cal cul at es t he average hei ght inthetree, thel onges t pat h t raver s ed, 
and the nunber of messages sent per read and per write operati on,;,ufchng^ peal i ty 
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W 



Figure 5.2: The Mr kov no del for count i ng t he nunber of val i d chi 1 dr en of a val i d par ent. 


parameters, w, the wri te rati o, b, the branchi ng factor, and L, t he nunber of 1 evel s i n t he 
tree. 

Define r to be the fracti on of reads to shared data and wto be the fracti on of wri tes to 
shared data, 'where r+iv= 1. 

W fir s t det er ni ne fq t he pr obabi 1 i t y t hat c chi 1 dr en of a val i d node at 1 evel l are val i d, 
by cons t r uct i ng a Mr kov no del, as shown i n Fi gur e 5.2. W us e t he s ol ut i on t o t hi s no del 
to calculate the expect ed val u$ dfhe: nunber of val i d chi 1 dr en at &] E^c^ =1 cvf. 

Ffci ng q, we cal cul at e t he val uq,of he pr obabi 1 i t y of t aki ng a val i d pat h whi 1 e per f or ni ng 
a node sel ecti on, 1 ooki ng down f roml evel =IAjt tthi s means that the root node wi 11 
be chosen wi th pr obabi 1 i ty one and si npl i fie s the equati ons . 

ti =Pi+(l - Pi)j (5.1) 

Right Cal cul at i ng t he expect ed hei ght areadwill r each gi ven t he node 1 is strai ghtf or- 
war d. Ar ead by a node i n t he cl as s of nodes whi ch ar e val i d wi 11 be of 0 hei ght. A read 
by a node i n the cl as s of nodes whose parent s are val i d but who i s not itself val i d wi 11 be of 
hei ght 1. The expected hei ght of a read reque^]t ,i ven i n Equati on 5. 2. 

L—l L 

$h r \ h) n 

/i=l l =h-\-l 


(5.2) 
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Figure 5.3: In both of these exairples, the grey nodes have copies of the value, and the 
bl ack node is at t eirpt i ng t o per f or ma wri t e. The wri t e ope rat ion wi 11 have to reach t he 
top 1 eve 1 of the tree i n order to conpl ete. 


In or der t o cal cul at e the expect ed hei ght a write request wi 11 reach, vie mist consi der 
not onl y whether or not the requesti ng node has a copy, but al so whether or not other nodes 
of any cl as s es have copi es . A wri t e mis t pr ogres s up war ds i n unt i 1 such a hei ght as whi ch 
onl y a si ngl e node at each 1 evel is val i d, and t hos e nodes are all ances t or s of t he wri t i ng 
node. Forexanple, as shown i n Fi gure 5. 3, where the bl ack node i s requesti ng the wri te 
ope rat i on and grey nodes have copi es of t he val ue, a wri t e ope rat i on woul dneed to reach 
t he t op of t he tree in hot h cas es . In t he fir s t case, t he f ul 1 hei ght of t he tree is needed j us t t 
reach any ot her copi es . In t he second case, al t hough a shor t er hei ght i s suffci ent tolocate 
a node wi th a copy of the bl ock, the wri te operati on mist reach the top of the tree i n order 
t o i nval i dat e t he ot her copi es . Keepi ng these rules inni nd, we cal cul at e t he expect ed wri t e 
hei ght, ]$fy] . 

L -1 L 

§h w \ bk) n W ( 5 - 3 ) 

/l=l l =fr+l 

longest Rith The 1 ongest path traversed duri ng a read request i s the path of the request 
up the tree, downto the node that has i t, and then di recti y to the requesti ng node, as shown 
inligure 5.4. The expect ed 1 onges t 1 eng};H iff hus : 

L—l L 

( 2/l+l) ( 1 -4 h) JJ tl (5.4) 

l =&-{4 

The 1 ongest pat hfor awrite is up tot he hi ghes t node, down t o al 1 of t he copi es , back 
up to the top, and then down from the top to the requesting nod^]. i^^iven in 
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Fi gure 5. 4: The left exanpl e shows a read request, the ri ght a wri te request. In both of 
these exanpl es, the grey nodes have copies of the value, and the bl ack node is nakingthe 
r eques t. The 1 onges t pat h t raver s ed i s shown f or bot h cases. Not e t hat for t he wri t e case, 
there are other equal 1 y 1 ong paths not shown. 


Equation 5. 5. 


L—l L 

§ l w\ =Y, 4 } ( 1 ^h t h) n tiv i ( 5 - 5 ) 

h -i l =h-\l 

Nirher of Mssages The cal cul at i on of E^m r \ , the expect ed nunber of messages per 
read ope rat i on, is verysinilartot hat for t he expect ed 1 onges t read pat h. The onl y di ffer ence 
i s t hat the set of mess ages s ent f r omt he read or i gi nat or t o t he t op node of t he r ead t o confirm 
that the read has occurred mist be added in. 

L—l L 

$m r \ (3/i+l)(l -t h ) JJ ti (5.6) 

/l=l l 

The expected nunber of messages per wri te operati on, on the other hand, requi res more 
knowl edge t han t he 1 onges t wri t e pat h cal cul at i on. Thi sis because t he nunber of nes sages 
depends on hownany nodes have read the block since the last write and therefore need 
to be invalidated. Benenber th^tisrthe expect ed nunber of val i d chi 1 dr en of a val i d 
node at a level l. For each non-1 ocal write, one set of nessages is sent fromthe requester 
to t he hi ghes t node i n t he tree i nvol ved i n t he wri t e, as shown in H gure 5.5, a f ul 1 f an- 
in and fan-out of acknowl edgnent s and i nval i dat e s is sent to all nodes with copies, a final 
acknowl edgnent i s s ent down t o t he wri t er, and wri t e owner shi pis t r ansf err ed t o t he wri t er. 
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Fi gure 5.5: Thi s figure i 11 us t r at es t he nunber of mes s ages s ent for at ypi cal wri t e ope rat i on. 
The grey nodes have copi es of t he val ue, and t he bl ack node is per f or ni ng t he wri t e. 

The expect ed nunber of ness ages per wr i t e, .Jf ms t hus : 

L- ill h h-i \ L \ 

$m w \ =^(i2/i+i+2 y n c e) (! ith) n w) ( 5 - 7 ) 

h=l y y l -i e=i — 1 J l =h-\\. J 

5 . 2 Ap plications for Mb del Ve rification 

Three appl i cat i ons have been enpl oyed i n t he val i dat i on of t he no del. Che i s a uni form 
reference pat tern, in whi ch every process or is equal lylikely to reference all of dat a. Th< 
second ninics a basic r el axati on pattern, such as a Jacobi relaxation. The third is a 
synthetic pat t er n exhi bi t i ng cl ust eri ng behavi or: nodes further away froma fixed “hone 
1 ocati on” of data access it less frequentl y than do cl oser nodes . 

5.2.1 liiform 

The uni f ormr ef er ence pattern fits the model exactly. Uni f or ni t y i npl i es that every node 
i s equal lylikelyto reference any bl ock. Becaus e of t hi s pr oper t y, thereisnolocality, sot 
ent ire set of locality par anqtsdrcsujod al ways be zero. 

5.2.2 Rl axati on 

In the particular relaxation we sinulated, during every iteration every point of an n 
di nensi onal mesh updat es its val ue by a f unct i on of t he val ue of its 2nnei ghbor s . 

Cbnsider a 2-di nensi onal relaxation implemented on a 2-d grid of processors. The 
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obvious my to enbed the probl emi s t o nap a cont i guous 2-d portion of the rel axati on 
array ont o a si ngl eprocessor, sucht hat t he nearest nei ghbor s of al 1 of t he poi nt s i n a si ngl e 
processor are either on that processor or a neighbor of that processor. The enbedding, 
shown i n Fi gur e 5.6, is r eas onabl e for a hi er ar chy as ml 1. For t he model, m as suns t hat 
the rel axati on gri d i s napped i n the above fas hi on. 

Not e that an exact cal cul ati on of the read and wri t e hei ght can be perf orned f or thi s 
appl i cat i on. Define dZid Wj as t he number of read and wri t e ope rat i ons whi chreachlevel 
l, respectively. These f or mil as are shown and deri vedi n Equati ons R 1- B. 3 i n Appendi x B. 

The read and wri t e hei ght s for t he appl i cat i on can t hen be exact 1 y expr es s ed as : 


5.2.3 Ouster 


h r 


Yi^o iRi 

Yf-J Ri 


h w 


YfY m 

Yf^ 1 Wi 


(5.8) 

(5.9) 


The cluster al gori thmas sunes that there are clusters or groups of processors working on 
dat a. Qusters are saidto own bl ocks . The processors withinagivencluster are more 1 i kel y 
to reference blocks owned by t he cl us t er t han bl ocks owned by ot her cl us t er s . Thi s model 
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Figure 5. 7: In t hi s figure, t he bl ack node owns a bl ock. The 1 i ghter the col or of a node, the 
less likelyit is to access the block. 


is sinilar to the one proposed by Q ng Aang, i n [ 31] . 

Itefine e as t he f r act i on of all ope rat i ons by node P whi choccur to its own bl ocks . ei s 
t he defini ng par anet er of a cl us t er appl i cat i on. As shown i n Fi gur e 5. 7, Paccesses bl ocks 
owned by processors in the group of 6 processors containing but not including Pwith 
uni f or mpr obabi 1 i t y, .Evhi ch i s cal cul ated frome Paccesses bl ocks owned by processors 
i n the group of 2 iproccssor s cont ai ni ng but not i ncl udi ng the af orenenti oned 6processors 
wi th the snal 1 er uni f ormprobabi 1 j.t s access probabi 1 i ty i s cal cul at ed as f ol 1 ows : 


Ei 


e 


{ U-e) 


•yL — l—l 

i^r 


z=o 

le [i,p-i] 


(5.10) 


This f or mil a i npl i es that the frequency of requests to processors in the next largest 
cluster but not in the current one decreases by a factor of t\ro as the clusters increase. 


5.3 Si mu 1 ati on of Ap pi i cati ons 

The s ynt het i c addr es s traces of t hr ee appl i cat i ons were si irul at ed usi ng t he si mil at or de- 
scribedin Qiapt er 4inorder todeternine val u^stfiifflrsjet of 1 ocal i t y par anet er s , wi t h 
whi chto check t he no del. All appl i cat i ons were si mil at ed hot h f or a 2- di nensi onal, r adi x 
4, four 1 evel tree (P=4, 6 = 4) and for a 3- di nensi onal, radi x 8, three 1 evel tree (P=3, 

6 = 8). In both cases this resulted in a 64 processor si mil ati on. As mich me nor y ms 
al 1 ocat ed t o t he pr oces s or s as ms ne cess ary to runwit hout i ncur r i ng cache over flowni s s es , 
i n order to si mil ate i nfini te cache si ze. 
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Thbl e 5. 1: These are t he val ues of t he par amet er s usedinthe si mil at i ons . 


Uniform The uni f ormaddr es s trace consists of references, by every processor, every T 
si mil at or s t eps , t o a r andoid y chos en one of raddr es s es . M 1 addr es s es are equal 1 y 1 i kel y t o 
be chos en by each pr oces s or. The par anet er var i edinthis trace is the perce nt age of writ es, 
w 

Ifelaxation The r el axat i on addr es s trace consi s t s of cycl es of reads t o nei ghbor s f ol 1 owed 
bywrites. Inthe trace every node si mil t aneousl y nakes t he read r eques t s f ol 1 owed by t he 
wri t e r eques t s for t he fir s t bl ock, t hen t he read r eques t s f ol 1 owed by a wri t e r eques t f or 
the second block, etc. In other words, the grid is bei ng updated such that some blocks 
are updat ed by val ues f r oml ater iter at i ons onearlier iter at i ons , sinilartoa Ghus s - Sei del 
relaxation. There are Tsimilator cycles bet ween each ref erence. The amount of the grid 
as si gned to each node was var i edacross the si mil at i ons . q t he nunber of i t er at i ons , i s 1 ow 
because the similator performed a warmstart for this application. For the 2-di mensi onal 
(6 = 4) relaxation, the percentage of writes was 20; for the 3-di mensi onal (6 = 8), the 
percentage was 14. 

Ouster The cluster address trace consists of references, by every processor, every T 
similator steps, to a r andoid y chosen one of Af addresses, where IVi s the nunber of 
processors. Accesses by a parti cularprocessor to self - owned addr es s es occur wi t h pr obabi 1 i t y 
e The pr obabi 1 i t y of references toother clusters is cal cul ated accordi ng to the f ormil a 
describedearlier in Equat ion 5. 10. The par anet er s var i ed i n t hi s trace are t he per cent age 
of wr i t e s , wand t he bas e pr obabi 1 i t y e 
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0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 

Write Fraction Write Fraction 


Three Level, Radix Eight Tree Four Level, Radix Four Tree 

Fi gur e 5.8: The 1 ocal i t y par amelj ©nsnpasur ed f r omt he uni f or mappl i cat i on si mil at i on. 


5. 4 Locality Para me t e r s Me asured fromSi mu ] 

The si mil at i on al 1 owed us t o neasu^et^ie set of locality par anet er s for t he appl i cat i ons . 

IK form As predicted, the values fopa^re nearl y zer o f or the uni f or mappl i cat i on, as 
shown i n Fi gur e 5.8. Ahi gh poi nt occur s when t he per cent age ofwritesiszero; this behavi or 
i s caus ed by t he fact t hat there are nowrites at all. If the si mil at i on ms run for a very 
1 ong t i me, event ual 1 y near 1 y every node mul d have a copy, and t hen t her e mul d seemt o 
be cor r el at i on i n t he choi ce of a subt r ee; a val i d subt r ee node mul d be more 1 i kel y to be 
selected t han an i nval i d one be caus e there are so nany. 

Ifelaxation As t he ammnt of dat a per node i ncr eas es , t he appl i cat i on exhi bi t s n»re and 
n»re 1 ocal i t y at ever y 1 evel, as expect ed. Nat e t hat t her e is t r enendous 1 ocal i t y f or t he 
references whi ch reach t he hi gher levels, as there are extrenelyfewof them 

Ouster For al 1 val ues of e, t he f r act i on of references by an owner t o i t s own bl ocks , and 
l, the nunber of levels, as the write f racti on i ncreases thf daimeaefisp This is 
because a wri te i ssued froma irore reirote node capSt©sdpcrease t wi ce: once when the 
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Data per Dimension Per Node Data per Dimension per Node 

Three Level, Radix Eight Tree Four Level, Radix Four Tree 

1 gure 5. 9: The 1 ocal i ty paranel/eaissijEasured f romthe rel axati on appl i cati on si mil a- 
i on. 
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Write Fraction Write Fraction 
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Fi gure 5.10: The 1 ocal i t y parane^ etBsnpasur edf romthe cl ust er appl i cati on si mil ati on. 
The base reference fracti on e = 0.75. 
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'i gur e 5.11: The 1 ocal i t y par anet etBsnpasur edf r omt he cl us t er appl i cat i on si mil at i on. 
he base reference fracti on e = 0.5. 
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Fi gur e 5.12: The 1 ocal i t y par anet etBsnpasur ed f r omt he cl us t er appl i cat i on si mil at i on. 
The base reference fracti on e = 0.25. 
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writeisissued, to pul 1 t he val ue i nt o t he cache, and once when t he n»re coimen read or 
wri te occurs to the owner node. 

Nate that as e i ncreasesi, spays artificially high. This behavior is caused by the 
extrenel y hi gh val ue of e Wen eis 0.75, three-quarters of all requests to blocks owned 
by node Pare nade by node P This naans that the effect nentioned above, wh^re p 
de creases withwritefracti on, does not oftenoccur (sinceawritetothe owner node f ol 1 owed 
by anot her oper at i on t o t he owner node (wi t h no i nt er veni ng wri t es by ot her nodes )) does 
not lover the 1 ocal i ty paranat er. 

5.5 Gbnpari son of Mi del and Si mil at i on 

The predi ct ed and si mil at ed average read and wri t e hei ght s are very si ni 1 ar, and confirm 
t hat the set of locality par anat er s i s a val i d way of expr es si ng t he behavi or of an appl i cat i on. 
The predi cted nunber of nassages per read and wri te operati on al so conpares f avorabl y to 
t he s i mil at i on. 

Uni f o r m W exanined the average read and wri te hei ght of t he uni f or mappl i cat i on as 
the write fractionis var i ed, as shown i n fl gur e 5.13. Nat e t hat as expect ed, usi ng t he val ue 
of the set of 1 ocal i t y par anat er s naasured f romt he si mil at i on produced i dent i cal results 
as just using the value zero, shoving that the deviations fr omzefGEiinvisheaipe 
i ns i gni fic ant. 

fl gure 5. 14 conpares the predi ctednunber of nassages per operati on vi ththe si mil ated 
nunber. Al though the predi cted nunber of wri te nassages i s hi gher than the si mil ated for 
1 ow val ues of t he wri t e f r act i on, t he pr edi ct ed and si mil at ed nunber near 1 y nat ch f or t he 
rest of the wri te fracti on range, and the shape of the curve i s very si nil ar. 

Re 1 axat i on For t he r el axat i on appl i cat i on, we exam ned t he average hei ght char act eri s 
as a function of the airount of dat a al 1 ocat ed t o each node. The resul ti ng 3-di nensi onal 
(radix 8) and 2-di nensi onal (radix 4) graphs can be seen in figure 5.15. Nate that the 
nunbers shown on the x-axis of the graphs represent the ammnt of data allocated per 
di nensi on per node. In other words , t o cal cul at e t he act ual dat a per node, cube the nunber 
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gur e 5.13: Thi s figure shows t he average read and wri t e hei ght no del pr edi ct i ons as well 
t he si mil at ed ones for t he uni f or mappl i cat i on. Not e t hat t wo s et s of zwainres f or p 
ed: one where^pnas set to zero, and one wheijewps measured fromthe simulation. 
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Figure 5.14: This figure compares the pr edi ct ed nunber of messages per read and write 
operati on wi th the si mil ated val ues for t he uni f or mappl i/cwtsi maasjir ed f r omt he 
si mil ati on for the predi cted curve. 
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Fi gur e 5.15: Thi s figure shows t he average read and wri t e hei ght no del pr edi ct i ons as well as 
t he si mil at ed ones for t he r el axat i on appl i cat i on. Two di fie rent pr edi ct i ons are shown: one 
uses t he 1 ocal i t y par anet er model, and one uses an analytical cal cul at i on of the rel axati on 
dat a mot i on t o di r ect 1 y cal cul at e t he hei ght s . 


showni n the figure f or the 3- di mensi onal case, and square the number f or the 2- di mensi onal 
case. The di r ect 1 y cal cul at ed average hei ght s ment i oned in Section 5. 2. 2 are also incl uded 
on the graphs. The exact cal cul ati on does not work properl y when there i s onl y one data 
point allocatedper processor. 

Cl u s t e r Fi gure 5. 16 shows graphs of the wri te fracti on versus the average read and wri te 
characteristics for three values of the base reference parameter: 0.75, 0.5, and 0. 25. The 
model is more accurate for higher base reference values. 

Figure 5.17 compares the number of messages per read and write operati on predi cted 
fromthe model with the number of messages recorded as sent by the simulation. Abase 
reference rate of 0. 75 and 0. 25 i s shown in the figure. Note that in these graphs, the 
predi ct ed number of messages per read andwrite seen* over-predi ct ed f or higher values of 
the base reference rate. In fact, the number of si mil at ed nes s ages is low. This effect is 
caused by the method of gatheri ng message stati sti cs i n the si mil ati on: messages whi ch are 
s ent f r oma physi cal processor to itself, evenif the vi r t ual nodes bei ng r epr es ent ed change, 
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fl gure 5.16: Thi s figure shows the average read and wri te hei ght irodel predi cti ons as well 
as the si mil at ed ones for the cluster application. 
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Fi gure 5. 17: Thi s figure conpares the si mil ated and the predi cted nunber of nessages per 
read and wri t e ope rat i on f or t he cl us t er appl i cat i on. 
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Model Paraneters 


Mdel Paraneters 

w 

0. 3 


w 

0. 3 

b 

8 


b 

4 

L 

3, 4, 5, 6 


L 

4, 5, 6, 7, 8, 9 

N 

64, 512, 4096, 327< 

18 

N 

64, 256, 1024, 4096, 16384, 6E 


536 


Thbl e 5.2: These are the values of the input paraneters for the model. 


are not counted. Because of the way the nappi ng of vi rtual nodes to physi cal processors i s 
performed (see Section2.2for details), nany more ness ages are s ent f r oma processor to 
itself when t he has e r ef er ence r at e is hi gh. 

5.6 Protocol Char act eri zati on f or Large Mchi ne Sizes 

In thi s secti on, the veri fiedmodel is used to predi ct the behavi or of the protocol on nachi ne 
sizes toolargetosi mil at e. 

5.6.1 F&rarreters 

In order to sinplify the study, several of the model input parameters have been con- 
strai ned, as shown i n Thbl e 5. 2. The fracti on of wri tes i s fixed at 0. 3: a reasonabl e choi ce 
f or par al lei appl i cat i ons [28] as veil as one at whi ch t he mess ages per ope rat i on cal cul at i on 
is accurate. TTees vi th t wo di fferent radices, eight and four, are modeled. The range of 
nachi ne sizes is chosento showt he t rend of t he curves . 

The set of locality par amet er sis fixed t o a si ngl e val ue for all 1 evel s , rat her t han a s et 
of values for each level. This fixing still provides i nteresti ng resul t;s^= (besaus e VZ : p 
a uni f ormr ef er ence input streaip andjWl ps a conpl et el y 1 ocal input stream Mist 
appl i cati ons vi 11 1 i e bet veen these t vo extremes . Hirthermore, thq fialufisffef qnt 
l seen in the uni f or mand r el axat i on appl icationvere very close, and t he val ues in most of 
the cluster applications uere sinilar. 
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5. 6. 2 Average Iti ght 

Figure 5.18 shows the average height per request as a function of the nachi ne size 
and t he 1 ocal i t y. The “Hei ght/Q)erati on” characteri sti c i s det er ni ned by wei ghti ng t he 
“Hei ght /Bead” and t he “Hei ght /Wi t e” val ues by t he wri t e f r act i on. Besul t s for hot h r adi x 
eight and radix four trees are shown. Nate that the “Mtchi ne Size” axis is plotted on 
a 1 ogar i t hni c s cal e; al t er nat el y, t he s cal e can be vi ewed as 1 i near, r el abel i ng t hat axi s a 
“Nunber of Level s ” wi t h t he val ues 3-6 f or r adi x ei ght, and 4-9 f or r adi x f our. 

There ar e t wo t rends t o obs er ve. Fi r s t not e t he i npor t ance oflocality, especiallywith 
1 arger nachi ne si zes . The second effect i s that of nachi ne size. The average hei ght s al 1 grow 
sub-1 i nearl y wi th nachi ne si ze, and near 1 y 1 i near wi t h t he nunber of 1 evel s . Note, however, 
that wi th a 1 arge nachi ne size, and 1 owl ocal i ty, nearl y the enti re tree is bei ng traversed 
duri ng an average request. This behavi or i s cl earl y unaccept abl e. 

5.6.3 Longest F&th 

Fi gure 5. 19 shows the effect on the 1 ongest path per request as a f uncti on of the nachi ne 
size and t he 1 ocal i t y. The f or mof these results is verysinilar tot hat of t he average hei ght. 
The nai n poi nt to note about these graphs is the sheer nunber of nodes eachrequest will, 
on average, have to pass through. Even if the network bandwidth were large enough to 
support thi s nany request s , the nodes need to exanine each message passi ng through, and 
wo ul d 1 i kel y have 1 ong queues of pendi ng ness ages to exanine. 

5.6.4 Nmber of Mssages 

The nunber of nessages sent per request as a function of the nachi ne size and the 
locality is s hown i n fi gur e 5.20. The shape of the curve of nunber of nessages sent per 
read is sinilar tot hose discussedearlier. The curves for t he nunber of ness ages s ent per 
uHfeandthe nunber of nessages sent per operation (the wei ght ed conbi nat i on of reads 
and writes), on the other hand, are different. 

Because the nunber of nessages sent per write depends not only on the distribution 
of the nodes with copies of the block, but also on the nunber of nodes with copies of the 
bl ock, t he effect s of nachi ne size and localityare nuch n»re pronounced. Ins t e ad of var yi ng 
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El gure 5. 18: Thi s figure shoves the predi cti ons for 
of nachi ne size and of locality. The left graphs 
are f or radi x f our. 


average height per operation as a function 
are for radix eight trees; the right graphs 
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Figure 5.19: This figure shows the predictions for the length of the longest path traveled 
per r eques t as a f unct i on of nachi ne size and of 1 ocal i t y. The left graphs are for r adi x ei ght 
trees; theri ght gr aphs are for r adi x f our. 
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Figure 5.20: This figure shows the predictions for the nunher of messages sent per request 
as a f unct i on of nachi ne size and of 1 ocal i t y. The left graphs are for r adi x ei ght trees; t he 
right graphs are for radix four. 
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1 ogari t hiri cal 1 y, where t he nai n gai ns are for localityinthe 0. 75 to 1 range, t he nunber of 
ness ages per write vers us localitycurveis bar el y affect ed by snal 1 changes for hi gh 1 ocal i t y; 
as t he 1 ocal i t y deer eas es , t he nunber of nes s ages sent per writeincreases pol ynoni al 1 y. The 
degree of the pol ynoni al varies withnachine size, inplyingthat whi 1 e applications with 
poor 1 ocal i t y nay per f or mr eas onabl y on snal 1 nachi nes, theywill s wanp 1 ar ge nachi nes . 

The nunber of nes s ages sent per write and per ope rat i on as a f unct i on of nachi ne si ze i s 
act ual 1 y sub-1 i near. As a f unct i on of t he nunber of 1 evel s , however, t he nunber of nes sages 
sent is defini tel y quadrat i c. 

5. 7 Issues 

Al t hough the results describedinthis chapt er are of interest in exam ni ng t he per f or nance 
of the protocol, there are nany extensions that could be done to provide m>re insight 
into the abstract protocol behavior. Mtny appl i cat i ons should be analyzed to deternine 
t he preci s e neani ng of t he set of 1 ocal i t y paranet er s. A better estinat e of t he 1ocalit y 
par anet er coul d be us ed. Fi nal 1 y, t he dat a i n an appl i cat i on coul d be di vi ded i nt o s et s , and 
the locality par anet er s s e par at el y cal cul at ed f or each one. 

Current 1 y, t he 1 ocal i t y par anet er set can onl y be der i ved f or an appl i cat i on by neasur i ng 
t he par anet er s f r omsi mil at i on. W have per f or ned s one i ni t i al work t owar ds der i vi ng t he 
set of locality par anet ers froma spatial locality irodel of an application, such as that 
available for the cluster application. The der i vat i on works best, however, for applications 
whi ch exhi bi t a very hi gh degree of cl ust eri ng. Mre work needs t o be done i n thi s area. 

Iki ng a flat set of locality par anet er sis not necessarily reali sti c. For 1 ar ge appl i cat i or 
r unni ng on nas si vel y par al 1 el nachi nes , we ni ght expect less shari ng t o occur near t he top 
of t he hi er ar chy, and nor e at t he bot t om St udi es need t o be done of appl i cat i ons t o pr ovi de 
i nsi ght as t o what t he hi er ar chi cal 1 ocal i t y of appl i cat i ons act ual 1 y i s 1 i ke. 

For applications whi ch have a large variance in the types of dat a r ef er enci ng, several 
sets of 1 ocal i t y par anet er s can be us ed t o avoi d aver agi ng effect s . This woul d al 1 ow one 
t o s epar at e wi del y shared dat a such as s ynchr oni zat i on var i abl es f r oml ess us ed ones . Thi s 
s epar at i on i s us ef ul becaus e an appl i cat i on nay s t al 1 due t o t he hi gh shari ng of s ynchr oni za- 
t i on var i abl es . Thi s net hod ni ght al s o pr ovi de new i nsi ght i nt o t he i nt er act i ons bet ween 
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shared data and programexecuti on- ti ne behavi or. 

5. 8 Surinary 

In t hi s chapt er we have pr opos ed a net hod of expr es si ng localityin appl i cat i ons napped 
onto hi erar chi cal architectures. W have us ed t hi s nadel to predict the average he i ght per 
request, the average 1 ongest path per request, and the average nunber of ness ages sent per 
r eques t. W us ed t hr ee appl i cat i ons i n or der to val i dat e t he no del: a uni f or mr ef er ence 
s t r eaiq a r el axat i on al gor i t hip and a clustering dat a-reference s t r earn 

Af t er val i dat i ng t he nadel, we enpl oyed i t i n t he pr edi ct i on of t he abs t r act per f or nance 
of very 1 arge nachi nes as a f uncti on of the 1 ocal i ty, studyi ng howthe irodel output s vari ed 
withnachine size and locality. The nast inportant result is that locality is extrenely 
inportant in an application. As nachi ne sizes grow, the locality becones increasingly 
inportant for reducing latency. 

W will use the abstract nadel as input to an enbedded nadel in Chapter 6. The 
enbedded nadel describes howthe protocol runs when napped onto parti cul ar nachi nes. 

This will allowus to study howthe protocol behaves under conditions where requests are 
not al 1 owed t o s end an uni i ni t ed nunber of nes s ages wi t hout penal t y. 



Ch a p t e r 6 


Emb edded Anal ysi s 


This chapter extends the abstract analysis of Chapter 5 to showhowenbeddi ng PHDinto 
a nachi ne affect s t he behavi or of t he pr ot ocol. W us e t he nappi ng des cr i bed i n Sect i on 2. 2 
to enbed the protocol i nt o a k-ar y n-cube. The enbedded model describes this napping, 
as well as the configurati on of the architectures being studied. 

I n our s t udy we find t hat mil tit hr eadi ng i s onl y us ef ul f or appr oxi nat el y t wo to f our 
threads; i nt erl eavi ng nor e than that does not decrease the overall latency. For snail na¬ 
chi nes and hi gh 1 ocal i t y appl i cat i ons , this linitationis due nai nl y to t he 1 engt h of t he 
runni ng threads . For 1 arge nachi nes wi t h nedi umt o 1 ow 1 ocal i t y, this linitationis due 
nai nl y to the 1 arge protocol overhead. 

W al s o consi der t he addi t i on of cont r ol lers to the processing nodes. W will see t hat 
the gains fromthe addition of these controllers are not large enough to j usti fy hardware 
whi ch i s nor e expens i ve t han pr oces s or s . In no case does t he addi t i on of t he cont r ol lers 
s ave nor e t i ne t han doubl e t he nunber of pr oces s or s . 

W first provi de a bri ef descri pti on of the enbedded irodel i n Secti on 6. 1 and deri ve the 
neces s ar y i nput s . W t hen char act eri ze t he behavi or of t he napped pr ot ocol i n Sect i on 6. 2 
for s ever al di ffer ent ar chi t ect ur es . R nai 1 y, we di s cus s what further issues needtobe st udi e> 
i n Secti on 6. 3. An al phabeti cal 1 i sti ng of all of the vari abl es defined in thi s thesi s can b( 
f ound i n Thbl e A 1. 
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6. 1 .An Eirbedded Mdel 

The enbedded model i s basi cal 1 y the model deri ved by Johnson i n [ 13] , si i ghtl y modi fled 
to sui t our pur pos es . The abs t r act model is us ed t o gener at e t he i nput s t o t he enbedded 
model. 

The enbedded model is usedtost udy t wo nai n ar chi t ect ur al configur at i ons : a nachi ne 
i n whi ch protocol acti vi ti es are handl ed by the sane processor whi ch i s attenpti ng to do 
work, as out linedin Section 2. 2, and a nachi ne i n whi ch pr ot ocol act i vi t i es are handl ed by 
a separate controller. 

6. 1. 1 Mdel CVervi ew 

Johns on devel oped a f ran* work for model i ng howconmini cat i on affect s per f or nance. K s 
fran*work consi st s of three part s : a net work model, an appl i cat i on model, and a transacti on 
model. These three are conbi ned i nto a si ngl e model i n order to provi de feedback between 
each subs ys t em nodes wi 11 be unabl e toinject ness ages i nt o t he net work f as t er t han t he 
t r ans act i on 1 at enci es wi 11 all ow. The model is fullydescribedin[13]; onl y t he part s of t h 
model whi ch have been changed for t hi s anal ysi s wi 11 be di s cus s ed i n det ai 1. 

The enbedded model directlyuses the appl i cat i on and t he net work model s . An appl i - 
cat i on consi s t s of t hr eads runni ng on pr oces sor s . The t hr eads run unt i 1 t hey nake off- node 
requests ( conmini cat i on transacti ons ). In the absence of mil ti threadi ng, the threads sus¬ 
pend until their transactions finish. If there is mil ti threadi ng, and there are still runnabl 
threads, a context switch occurs, and a new thread i s started. 

In Johnson’s model the appl i cat i on model invokes conmini cat i on transacti ons ; i n our 
enbedded model it invokes off-node requests to shared n*mory. The off-node requests 
are model ed i dent i cal 1 y t o t he transacti on model, except that the n*aningof one of the 
par an*t er s is di ffer en|, tZhe fixed del ay of Johns on’s model, represents the tin* ne cess ary 
t o pr oces s pr ot ocol r eques t s by a non-1 eaf node. As such, i t becon*s a f unct i on of hownany 
n*ssages are sent. 

The net workmodel is used to deternine average n*ssage 1 atency, gi venani nput n*ssage 
size, injectionrate, and c onmini cation di s t anc e. The ne t wor k i s ass un* dtobe ak-ary 
n-di n*nsi onal n*sh, wi th separate uni di recti onal channel s i n both di recti ons . 
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R 


Average t hr ead run 1 engt h between successi ve r eques t s t o shared 


nemor y. 

Average tine to satisfy a locally sati sfiabl e r eques t t o shared nemor y. 


Context switch tine. 


Mi 


M v 


Average tine to process a protocol ness age i nvoked on a processor 




Average net work i nterf ace overhead 


C 


Number of words i n a cache 1 i ne. 


Number of flits per word. 


Pm 


Degree of hardware mil ti threadi ng. 


Thbl e 6. 1: The addi ti onal basi c i nput parameters needed for the enbedded model. 


B 


kd 


Average number of messages in critical path of a non-local shared- memory 


Average number of messages per non-local shared-memory request. 


Average mess age size (inflits). 


Average di s t ance a mess age t ravels (in hops). 


Average di stance a message travel sin each di nensi on. 


Average t hr ead run 1 engt h between successi ve non- locally sati sfiabl e request s . 


Non- network overhead to sati sf yi ng a non-1 ocal shared- memory request. 


r e que s t. 


Thbl e 6.2: The derived input parameters needed for the enbedded model. 


6.1.2 Mdel Inputs 

The enbedded model t akes as i nput nany par amet er s . Some of t hese paranet er s have been 
di s cus s ed i n t he abstract analysis chapter, and vary dependi ng on the application. Other 
par amet ers, showmin Thbl e 6.1, need to be specified onl y when t he pr ot ocol is napped t o 
an ar chi t ect ur e. The enbedded model uses athirdset of par amet er s , der i ved f r omt he fir s t 
two sets, as its act ual i nput s . Thi s thirdset is listedin Thbl e 6. 2, and wi 11 be der i ved i 
this secti on. 

Local 1 y Satis fiabl e Shar e d-Nfe n»r y Be que s t s W first cal cul ate the number of re¬ 
quest s t o shared memory t hat are 1 ocal 1 y s at i sfiabl e, to use ini at er equat i ons . W det er ni ne 
t he expect ed f r act i on of 1 ocal 1 y s at i sfiabl e reads by cal cul at i ng t he pr obabi 1 i t y t hat a re que: 
wi 11 come f roma node i n the cl as s of nodes whi ch are val i d. 
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L 

$z r \ =n*z ( 6 . 1 ) 

i=i 

W find the expectedfractionof locallysati sfiabl e wri t es by cal cul at i ng t he probabi 1 i t y 
that a request w 11 cone froma node i n the cl as s of nodes whi ch are val i d, and al 1 of whose 
ances t or s are t he onl y val id nodes intheir set of siblings. 

L 

$z w ] =nw (6.2) 

1=1 

The expect ed f r act i on of 1 ocal 1 y s at i sfiabl e shared- nenor y r eques t s is just the wei ght ed 
f r ac t i on of), ifnd Z w . 

]$% =rZ r -\-vZ w (6-3) 

Nunber of Mssages in Critical Path The cal cul at i on of the expect ed nunber of 
ness ages inthe critic al pat h of a non- locallysati sfiabl e shared- nener y r eques t, c, is sinilar 
t o t he cal cul at i on of t he 1 onges t pat h f or an opq.raotidd® ( fi Equat i ons 5. 4 and 5. 5). 

There are, however, t wo di ffer ences . first, we condi t i on t he cal cul at i on on non-1 ocal oper¬ 
ations by di vi di ng by the fr act i on of non-1 ocal requests. Second, we onl y want t o cons i der 
t hos e nes s ages whi ch act ual 1 y need t o be sent off-node. Inthe enbedding, a parent node 
and its 6 chi 1 dr en f or a parti cul ar bl ock correspond to 6 nodes . Thi s neans t hat one of t he 
childrenis si tuated on the sane physical processor as its parent. Equat i on 6. 4 gi ves the 
expected nunber of nessages for a read request. 

$ C r] = T ^T,( b -N- 2h + 1 )( 1 ~ t h) II *1 ( 6 - 4 ) 

1 ~ Z r fc i V b ) l=hhl 

The critical path for a write request contains a f an-i n and f an-out to all nodes with 
copies of the block. W nake the reasonable as s unpt i on t hat at least one of these paths 
wi 11 cont ai n no ness age s ends bet ween nodes napped t o t he s ane pr oces s or, s o t he expect ed 
nunber of nessages i n the cri ti cal path for a wri te operati on i s j ust the expected nunber i n 
the 1 ongest path for a wri te, condi ti oned on the non-1 ocal factor, as shown i n Equati on 6. 5. 
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$c w \ =- — \=—§l w \ (6.5) 

-L ^ w 

The expected nunber of messages i n the cri ti cal path of a general request i s cal cul ated 
by wei ghti ngrcand Cy, by the wri te fracti on. 

ipc] = vc r +w w (6-6) 

Nunber of Afe s s age s The cal cul at i on of t he expect ed nunber of ness ages sent for non- 
locally sati sfiabl e shared- neiror y r eques t s, g, is sinilar tot hat of t he nunber of nes s ages 
sent i n the abstract node!,, ^nd m w i n Equati ons 5. 6 and 5.7). There are t to di fferences 
between these calculations, the sane as described for Equations 6. 4 and 6. 5 above. The 
expected nunber of nessages for a read operati on i s gi ven i n Equati on 6. 7. 

9r] = x ^ z X (^ +1 ) ( 1 II * ( 6 ‘ 7 ) 

r h= 1 l=h \-1 

A wri t e ope rat ion gener at es nany ness ages , s one of whi chare expectedto staylocal 
to a physi cal processor. Si nee we are not counti ng the cri ti cal path 1 ength, j ust the total 

nessages sent here, we do wei ght by the branchi ng factor, as shown i n Equati on 6. 8. 

$g w \ =— ( ( b -T ~ 2h+1 + h ~T ~ 2 x) n ( i - v bh ) n ( 6 - 8 ) 

L Zjw h= 1 \\ ° ° 1=1 e=l — l / l=h \-1 / 

The expect ed nunber of nessages sent by a general request is cal cul at ed by wei ghti ng 
g r and g w by the write fracti on. 


$$ =W r +W w (6.9) 

Flits per Mssage The expect ed nunber of flits sent per nessageis dependent on the 
exact nachi ne to whichthe protocol is napped. Define /as the nunber of flits per word, 
and (7as the cache line size in words. W assune 32 bit words for the purpose of this 
calculation. Aread sends the 4 wor d find J owes t -cormvri-f or _read nessage, the 3 word 
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readm ness age, the 3 +(7 word read-data message, and the 4 +(7 word confirmval ue 
ness age. These ness ages are descri bed i n Thble 2. 4. 


USA =/rVs (11+a 3 "ti 3+q (1 ~ th) n *• (^°) 

1 -Z r fcl 3h+l l=hhl 

A write operati on sends the 4 word findJ owe st-commn-f or-write nessage, the 3 word 
l ock ness age, t he 4 wor d ack and ackl ness ages , t he 2 +(7wor d s-write-own ness age and 
t he 3 wor d writ e.ok ness age. 


§B W \ =f 


1 -Z, 


L -1 

£ 

h= i 


I ( 7) h+( 2 +Q +( 7) Yd=i nti-i c e (, u \ 
[ 2h+l+2 Ei=iU^Uc e ( Vhh ' 


n W (6.H) 

i=hj-i / 


The expect ed nunber of fli t s per general ness age sent, ^ is wei ght ed by t he nunber of 
nessages sent by each type of operati on as well as the frequency of each operati on. 


rg,B t + <8 W B W (6 . 12) 

rg r +*g w 

a st anc e per At s s age The expect ed di s t ance i n hops t hat t he average ness age t ravel s , 
4 depends on the radix A: and the nunber of dinensions nof the nachi ne to which the 
protocol i s napped. Nate that the enbeddi ng i s such t Ji&,t dhoqj e 6i s the branchi ng 
factor of the tree, is equal ton 

W fir s t det er ni ne $ t he expect ed nunber of hops needed for a ness age s ent bet ween 
a node at level l and its parent, where the parent and the node are located on separate 
pr oces s or s . Thi s is j us t t he nunber s ent for a 2- ar y n- cube [2] [ 22] , s cal ed by t he di 1 at 
caused by hi gher levels. 




&I 0 && ^ = & r ^ 1 ^ 

2 ( 6 - 1 ) ' 2 n —1 


(6.13) 


W al so need to det er in n^,dt he nunber of hops that a hi erarchy- ci rcumrenti ng mes¬ 
sage, such as read-data , will take. Anessage of this formis sent froma node directly to 
another node. The two nodes are guar ant eed not to share a connon sub- cube snaller than 
that of their lowest coimon ancestor. The val u^swrfecfcount ed, and are enunerated 
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Eadi x 

Level 


1 

2 

3 

4 

5 

6 7 8 

4 

4/3 

3 

37/6 

149/12 

199/8 

2389/48 9557/96 12743/64 

8 

12/7 

57/14 

237/28 

957/56 

3837/112 



Thbl e 6. 3: Thi s tabl e sho^s the enpi rically deternined nunber of hops a message sent 
di rect 1 y froma node i n a subt ree t o one out si de of its subt ree t akes , if t he 1 owes t conmm 
ancestor of those t wo nodes is at level l. This cal cul ati on assumes a uni f or mdi s t r i but i on. 


in Thble 6. 3. 

The expect ed di s t ance per readnsss age is gi ven i n Equat i on 6. 14. Nat e t hat t he nunber 
of hops sent bet wen each 1 evel is sunned and then mil ti pli ed by the nunber of tree 
t raver s al s . W do not worry about excl udi ng t he ness ages whi ch ar e not act ual 1 y s ent; t he 
difference is negligible. 

—e ( 3EL 3 i ^i i) +4 d -> k ) n >• < 6 - 14 ) 

1 L r fcl tn+L 

The expect ed di s t ance per write ness age is sinilarlycalculatedin Equat ion 6. 15. 


4 d w ] 


I -Z, 


g 2(E?=1 dl-i) +d' h +2 Ef=! d t -1 n^_! Ce 


2h+l+2 EhUthc, 


h= 1 


±J 

(i ~v\t h ) n Lvj 

(6.15) 


l=Wr 1 


The expect ed di s t ance per general message type sent, d, i s wei ght ed by t he nunber of 
messages s end by each t ype of operation as well as the frequency of each operati on. 


4:4 = 


rg r d r +w/ w d u 
rg r +W w 


(6.16) 


n stance per Dinension The average di s t ance a mes s age t ravel sin each^i recti on, k 
as sum ng i ndependence, is just djn In Johnson’s model, whejii velrefe s than one, the 
average per- hop 1 atency for the head of a message i s fixed to one. 
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Sun 1 engt h Bet ween Beques t s The average thread run 1 ength between successive 
requests to shared memory, Ij? i s one no del input. Anothej, itshMaverage ti ne needed 
t o sati sf y a 1 ocal 1 y sati sfiabl e shared-memory request, i.e. one that does not cause any 
off- node t r affc. The average t hr ead run 1 engt h between successive r eques t s t o non-1 ocal 1 y 
s at i sfiabl e shared nenor y,, TTs a function of t hes e t wo par anet er s . fbr the purpose of 
thi s model, r R s consi dered to be the useful work done by the processor. 

§ T r] =BP ^^{RPM z ) (6.17) 

No n- ne t wo r k Overhead The napping of non-leaf nodes onto the sane processors as 
leaf nodes guarantees that all processors will need to spend sone tine processing protocol 
transitions, instead of per f or ni ng act ual work. W model this non-net work overhead, T 
as a functi on of g, the average nunber of nessages sent per oper^,tthn,aMrage ti ne 
in processor cycles needed t o handl e a protocol request, b, the br anchi ng f^ict or, and N 
the ti ne taken up by the network i nterf ace. 

If every node is sending (/nessages on average per request, every node will have to 
process (/nessages. The cost of processing a nes s age iifs lM s t ays on the current 
processor, and r AfAf;if it cones i n f romanother processor. 

f\ =gM r -\ - -—gN i (6.18) 

b 


6.1.3 Mdel Constraints 

The model devel oped by Johns on i s onl y val i d under cer t ai n condi t i ons , where the avai 1 abl e 
paral 1 el i smi s snal 1 er than the conmini cati on transacti on 1 at ency, so that despi t e mil ti - 
threadi ng, the proces sor s wi 11 be i dl e part of the ti ne. In the enbedded model, the extra 
t i ne where t hr e ads woul d nor nal 1 y be wai t i ng can be us ed t o support pr ot ocol t r ans act i ons . 

In an architecture with no controller, where every processor both runs threads and 
support s pr ot ocol t r ans act i ons , t he anal ysi s is onl y val i d as 1 ong as t he t hr ead run 1 engt h, 
the context switch tine, and the protocol transaction overhead tine do not exceed the 
transacti on 1 at ency ti ne, as shown i n Equati on 6. 19. 



6.2. PROTOCOL CHARACTERIZATION 


93 


Tt > Pm( T s +T f) +( Pm —1) T r 


(6.19) 


Wen this condition is false, the average i nter-transacti on i,s aiseltiinnt,e(i as 
f ol 1 ows : 


tt —T r +T s +T f (6.20) 

Wen t he prot ocol reques t s are handl ed by a cont rol ler, there are two cons t r ai nt s t hat 
mis t be net. Fi r s t, t he t r ans act i on 1 at ency t i ne mis t be great er t han t he t hr e ad run 1 engt h 
and the context switch tine: 


Tt>Pm{T s ) HPm- l)T r (6.21) 

Wen this condition is false, the average i nter-transacti on j,s aiseltiinMt,ed to 
T r +T s . The t r ans act i on 1 at ency t i ne pi us t he t hr e ad run 1 engt h mis t alsobe less t han t he 
t i ne r equi red by t he cont rol ler toprocess the prot ocol r eques t s : 


Tt >p m Tf -T T 


( 6 . 22 ) 


Wen t hi s condi tionis false, the prot ocol is linitedbythe speed of the cont rol ler to a 
tineof no less than: 


t t =T f 


(6.23) 


Wen hot h of t hos e condi t i ons are f al s e, t he prot ocol is 1 i ni t ed by t he 1 ar ges t of t he 
above transactionissue tines. 


6.2 Protocol Characteri zati on 


There are several fundanental questions we mist address. The first is the snallest 
average i nter- transacti oni ssue ti ne that can be sustai ned. Thi s ti ne depends on the degree 
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Shared Mdel Paraneters 

T s 

20 cycles 

Mi 

10 cycles 

Ni 

15 cycles 

C 

8 words 

f 

2 fli t s 

Pm 

1, 2, 4 

p 

0- 0. 9 

W 

0. 3 

b 

8 

L 

3, 4, 5, 6 

N 

64, 512, 4096, 3270 


Thbl e 6.4: These are the values of the input parameters which are sharedfor the different 
ar chi t ect ur es . 


of mil ti threadi ng: wi th more threads runni ng, nore 1 at ency can be hi dden. Mil ti threadi ng 
is only useful up t o a cer t ai n poi nt, however. Another inportant question is what sort of 
overhead is seen, and what ar e i t s s our ces ? Are thelinits set byprotocol overhead or by 
net work 1 at ency? This section first describes the parameters chosen for the study, then 
shows the results of the study. 

6.2.1 Parameters 

W chose to study three nachi ne configurations, representing a variety of architectures. 
Across all three configur at i ons parti cul ar par amet er s, listedin Thbl e 6. 4, were he Id cons t ant, 
si nee we were most interestedin var yi ng t he ot her par amet er s . 

fl gur e 6. 1 shows t he nunber of fli t s per ness age, and t he di s t ance t ravel ed per ness age, 
as a function of nachi ne size and locality. These paraneters are used as input to the 
enbedded irodel. The nunber of flits per ness age is higher for hi gh 1 ocal i t y; this effect 
is causedbythe doninationof longer nessages, such as read-data. The nnre nodes which 
need t o be i nval i dat ed, t he 1 ower t he nunber of fli t s wi 11 be. The effect of t he di s t ance bei ng 
so conparati vel y 1 arge f or hi gh-1 ocal i ty 1 arge nachi nes i s caused because the di st ance grows 
exponent i al 1 y as one ascends the hierarchy, yet all levels of the hierarchy are assigned an 
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Machine Size 32768 



Machine Size 32768 



Figure 6.1: This figure shows the predictions for the average nunber of flits needed per 
message, and the average di stance that a nessage travel s, as functi ons of nachi ne si ze and 
of 1 ocal i t y. 


Mdel Paranet ers 


Opt ini stic- Q>t irrist 


R 

500 

M r 

20 


Mdel Par anet ers 

Opt ini sti c- Pessini st\ 

R 

500 

M r 

100 


Mdel Parameters 

Pessini sti c- Optimist, 

R 

50 

M r 

20 


Tkbl e 6. 5: These are the val ues of the i nput paranet er s whi ch are vari ed acros s the di fferent 
ar chi tectures. 


equal 1 ocal i t y par anet er. Thi s i npl i es t hat f ur t her s t udi es withaset of gr aduat ed 1 ocal i t y 
parameters rather than flat ones night be interesting. The nunber of critical messages in 
an operation, c, and the total nunber of messages sent per operation, g, ar e si ni 1 ar t o l 
and tji graphed i n Fi gures 5. 19 and 5. 20 and are therefore not shown here. 

Tkbl e 6.5 lists the val ues of t he par anet er s whi ch were var i edacross the architectures. 
The s e val ues c or r e s pond to three sit uat i ons : 

1. Opti ni sti c- Opti ni sti c: The run length between shared-neirory references is long 
( such as on t he J- Me hi ne, where float i ng poi nt ope rat i ons ar e i npl enentedin s of t - 
ware), and the protocol overhead is low. 

2. Qpti ni sti c-Pessi nisti c: The run 1 ength bet ween ref erences i s hi gh, and the protocol 
overhead is hi gh (f or exanpl e i f the protocol was i npl enented enti rel yin software). 


3. Pessinistic- Opt i nisti c: The run 1 ength between messages is very short, and the pro- 
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tocol overhead is low. 

W additionally consider the case where the non-leaf protocol overhead i s handled by a 
separate controller for all three architectures. 

Fi gur e 6. 2 shows t he average run 1 engt h bet ween r eques t s t o off- node shared neiror y, 

T r and the non- network- rel ated protocol processi ng ove^hedfflp sTrun 1 ength i s an 
i nput t o t he enbedded irodel, and was cal cul at ed f r omaver age run 1 engt h bet ween r eques t s 
t o shared me nor y (i£, in Equat ion 6. 17. Not e t halvaTi es f r ombei ng al nos t exact 1 y 

f or nachi nes withnolocality, to appr oxi nat el y 4iff or snal 1 nachi nes withalocalityof 
0.9, and 2_Rf or 1 ar ge nachi nes wi t h a 0. 9 1 ocal i t y. The overhead, shown i n Equat i on 6. 18, 
is another input to the irodel, and i s nai nl y affect ed by t he nunber of nes sages sent to 
s at i sf y an oper at i on. 

6.2.2 .Archi t ect ures Wthout A Separate Chche Gbntroller 

In all three architectures st udi ed, mil t i t hr eadi ng i s onl y us ef ul up t o t wo t hr e ads . In ot her 
wor ds , i nt er 1 eavi ng n»re t han t wo t hr eads does not i ncr ease t he t r ansact i on i s sue r at e. For 
snal 1 nachi nes and hi gh 1 ocal i t y appl i cat i ons , this linitationis due nai nl y t o t he 1 engt h of 
t he r unni ng t hr e ads . For 1 ar ge nachi nes wi t h nedi umt o 1 owl ocal it y, this linitationis due 
nai nl y to the protocol overhead bei ng too 1 arge. 

Inter - Transaction Iss ue Ti lie 

Fi gur e 6. 3 shows t he average i nt er -1 r ans act i on i s sue t i ne for one t hr ead and t wo t hr eads . 
Note that increasing the nunber of threads fromone to two provides little speedup. As 
expected, the lower protocol processing tines create nuchbetter transacti on i ssue tines. 

Si nee t he run 1 engt h var i es for di ffer ent nachi ne sizes and localities, we mis t 1 ook at what 
percentage of ti ne i s taken up i n the protocol overhead. 

Protocol Overhead 

W exanine the protocol overhead i n order to see howmich of the transacti on 1 atency i s 
caused by overhead and howmich represent s work bei ng done. Over head ( Q i s defined as 
the fracti on of the average transacti on i ssue ti ne not spent runni ng: 
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Machine Size 32768 Machine Size 32768 



Machine Size 32768 Machine Size 32768 




Qpt i iri s t i c-Pes si iri s ti c (iZ=50Q.; dUTOO) 


Machine Size 32768 Machine Size 32768 




Pes si ni s t i c- Qpt i iri s t i c ( iZ=5C^ d\I"0) 

Figure 6.2: The left half of this figure shows the predictions for the average run length 
bet ween off- node references to shared nerror y as a f unct i on of nachi ne size and of 1 ocal i t y. 
The right side shows the pr edi ct ed pr ot ocol processi ng overhead ti ne (dependent on the 
nunber of ness ages sent) per off-node shared-ne nor y request. 
























Txn Issue Time Txn Issue Time Txn Issue Time 
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Machine Size 32768 


Machine Size 32768 




Opt i m s t i c- Opt i m s t i c (iZ=500j. Ml 0) 


Machine Size 32768 



Machine Size 32768 



inistic (i£=50Q.;=dflf00) 


Machine Size 


32768 


Machine Size 


32768 




Pes si ni st i c- Opt i m st i c ( iZ=5C^ d\I"0) 


Pi gure 6.3: Thi s figure shows t he pr edi ct i ons for t he average i nt er-1 r ans act i on i s sue ti n® as a 
f unct i on of nachi ne size and of 1 ocal i t y. The graphs on t he 1 ef t si de are f or no mil tit hr eadi ng 
(pm =1); the graphs on the ri ght are for a mil ti t hr eadi n^©f2^.(p 
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J£<p =1 - — (6.24) 

tt 

Fi gur e 6.4 shows these results. 

The Q)t i m s t i c- Q)t inistic case does best, as expect ed. The Q)t inistic-Pessinistic case 
i s t ol er abl e f or snal 1 nachi nes wi t h hi gh 1 ocal i t y. The Pessinistic- Q)t inisticcase, onthe 
other hand, shows extrenel y hi gh overhead for nearly all conditions. If the typical run 
1 engt his onl y50cycles, as as suned f or t hi s case, t he pr ot ocol proces si ng t i ne needs t o be 
reduced before t hi s s ys t emcan be effect i vel y us ed. Nat e t hat there is very little speedup 
f romgoi ng t o t wo threads ; i n general, onl y snal 1 nachi nes wi th poor 1 ocal i t y benefit. 

6.2.3 .Archi t ect ures Wth A Separate Chche Gbntroller 

In order to i ncrease the perf ornance of the protocol, we consi der the case where a separate 
cont r ol 1 er exi s t s t o handl e pr ot ocol requests. This si tuati on wi 11 onl y be benefici al in tw 
cases: first, where the controller can be added to the systemmore cheapl y t han anot her 
pr oces s or, and s econdl y, where t he cont r ol 1 er can be desi gned t o be si gni ficant lyfaster t han 
a proces sor. If nei ther of these condi ti ons are true, there i s no benefit t o usi ng a control 1 ei 

W model the architectures with a separate cache controller by allowing the inter- 
t r ans actionissue time to de crease unt il it reaches the linits caus ed by ei t her t he pr ot ocol 
overhead or the run-1 engt h over head. W assume that the controller operates at the sane 
speed as t he pr oces s or di d i n t he ear 1 i er exper i ment; t he gai ns al 1 come f r omhavi ng a s ep- 
ar at e pr ot ocol handl er, not f r omi miens e cont r ol 1 er speed. For t he Qot i ni s t i c- Qot i ni st i c 
case (_R=500; =20), mil t i t hr eadi ng up t o f our results in better inter -1 r ans act i on i s sue 

t i mes . For t he Q)t inistic-Pessinistic (iZ= 5,06} 11M)) case, up t o ei ght t hr e ads can 
be profitabl y used to reduce latency. For the Pes si ni s t i c-Q)t i ni s t i c (,i2=2500); M 
case, onl y four threads provi de speedup. Again, t hes e 1 i ni t at i ons ar e due t o t he t hr ead r un 
1 engt h f or snal 1 nachi nes , or nachi nes wi t h ver y hi gh 1 ocal i t y, and t o t he pr ot ocol overhead 
f or 1 ar ge nachi nes wi t h medi umt o 1 owl ocal i t y. 
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Machine Size 32768 Machine Size 32768 




Opt i ni s t i c- Opt i m s t i c (iZ=500j. Ml 0) 


Machine Size 32768 Machine Size 32768 




Opt inistic-Pessinistic (iZ=50Q.; ^WOO) 


Machine Size 32768 Machine Size 32768 



Figure 6.4: This figure shows the predictions for the protocol overhead as a function 
nachi ne size and of 1 ocal i t y. The graphs on t he left si de are for no mil t hrie^di ng (p 
the ones on the right are for a nul ti threadi ng Tl o£2)(p 
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Machine Size 32768 Machine Size 32768 



Machine Size 32768 


Machine Size 32768 




Opt i iri s t i c-Pes si m s ti c (T2=50Q.; dUTOO) 


Machine Size 32768 Machine Size 32768 




Pes si m s t i c- Opt i m s t i c (_R=5(f; d\l"0) 


Pi gure 6.5: Thi s figure shows t he pr edi ct i ons for t he average i nt er-1 r ans act i on i s sue ti n® as a 
f unct i on of nachi ne size and of 1 ocal i t y. The graphs on t he 1 ef t si de are f or no mil tit hr eadi ng 
(pm =1); the graphs on the right are for the largest possible useful nul ti threadi ng, as 
descr i bed i nt he text. 
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Inter - Hansaction Issue Tine 

Fi gure 6. 5 shows the average i nter- transacti on i ssue ti ne for one thread and for the rrax- 
i mimnunber of useful threads, as described above. Nate that we do nowsee some im 
pr overrent due t o mrl t i t hr eadi ng, whi ch can be better obs er ved in t he overhead graphs . 

Protocol Overhead 

W agai n exam ne t he pr ot ocol overhead inorder to see howrmch of t he t r ans act i on 1 at ency 
is caused by overhead and howrmch represent s work be i ng done, figure 6. 6 shows these 
r e s ul t s . 

These results are nuch better than before. The use of a controller provides enorrrous 
gai ns i n pr act i cal i t y. For an Qrt i nd s t i c- Qrt irristicarchitecture, we can expect to efficiently 
r un t he pr ot ocol at 1 ocal i t y as 1 ow as 0.7 even on ver y 1 ar ge rrachi nes . The Qrt i nd s t i c- 
Pessirristic architecture perform nuch better t han before, al t hough it still has too nuch 
overhead f or 1 ar ge rrachi nes . The Pessirristic- Qrt irristic architecture has al s o i rrpr oved, but 
one woul d still not want t o us e t he pr ot ocol wi t h t hi s errbeddi ng on such an ar chi t ect ur e. 

Mist of the speedup occurs when goi ng f romone thread to two. The gains fromgoing 
beyond that are snail, and occur onl y on the boundary bet ween too nuch work and too 
nuch over head. For the Qrti rri sti c- Qrti rristi c archi tecture, the gai ns occur on the di agonal 
1 i ne bet ween 1 ar ge rrachi nes withlots of locality and srral 1 rrachi nes withlittlelocality. For 
the Qrt i rri s t i c-Pes si rri s t i c case, the line rroves closer to the snail rrachines with high 
1 ocal i t y. Thi s t rend ext ends tot he Pessirristic- Qrt irristiccase, i rrpl yi ng t hat t he gai ns al 1 
occur onlyfor snail rrachines wi t h hi gh 1 ocal i t y. 

Nate that these gains are of course not large enough t o j us t i f y cont rol 1 er s whi chare 
nur e expens i ve t han pr oces s or s . In no case does t he addi t i on of t he cont rol 1 er s s ave nur e 
tine than double the nurrber of processors. 

6. 3 Issues 

The results describedinthis chapt er pr ovi de s one i nsi ght as t o how t he pr ot ocol act ual 1 y 
behaves when napped to a k- ary n- cube i n the rranner descri bed i n Secti on 2. 2. There 
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Machine Size 32768 Machine Size 32768 




Opt i ni s t i c- Opt i m s t i c (iZ=500j. Ml 0) 


Machine Size 32768 Machine Size 32768 




Opt i iri s t i c-Pes si m s ti c (i2=50Q.; dtifOO) 


Machine Size 32768 Machine Size 32768 



Figure 6.6: This figure shows the predictions for the protocol overhead as a function of 
nachi ne size and of 1 ocal i t y. The graphs on t he left si de are for no mil t y hrle^di ng (p 
the ones on the ri ght are for the 1 argest possi bl e useful nul ti threadi ng, as descri bed i n the 
t ext. 
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are nany irore i nt er es t i ng exper i nent s to be done, however. first, other configurations 
of nachines whi ch woul d work bet t er with the protocol should be studied. Second, other 
enbeddings of the protocol to k-ary n-cubes should be considered, finally, other cache 
coherence protocol s shoul d be studi ed to deternine the coirpeti ti veness of the perf ornance 
of the Protocol for Hierarchical Directories. 

The nai n 1 i ni t at i on of usi ng t hi s nappi ng t o enbed t he pr ot ocol t o an ar chi t ect ur e i s 
cl earl y the protocol overhead. There are several ways to fix thi s probl em Che i s to bui 1 d 
fast controllers whi ch can i ndependentl y process the protocol requests, whi ch i s one of the 
goal s of the M T Al e wi f e proj ect [ 3] . The cost of addi ng such a control 1 er t o the nachi ne 
mis t be bal anced agai ns t the potential speed benefit s . 

Another way to reduce the overhead i s to guarantee that hi gh 1 ocal i t y i s naintained, 
with references to shared nenary rare in conpari son to the tin® needed to process pro¬ 
tocol r eques t s . In or der t o do t hi s , conpi 1 er t echnol ogy for s t at i c dat a pi ace nent mis t be 
inproved. Program mist be conpi 1 ed speci fical 1 y t o reduce the ammnt of data sharing. 

This t echnol ogy woul d benefit all cache coherence protocols. 

None of t he ar chi t ect ur es s t udi ed in t hi s chapt er was ever 1 i ni t ed by t he speed of t he 
network. Thi s i ndi cates that ei ther the assunpti ons i npl y a processor- network speed ni s- 
nat ch, and that the network is too fast, or that the protocol is fundanentall y too si ow. 

St udi es usi ng a ver y f as t cont r ol ler withfast processors, or fast cont r ol 1 er s and si owpr oce 
s or s coul d be us ed t o eval uat e how t he pr ot ocol per f or nance i s affect ed by t he enbeddi ng. 

This study does not indicate whether or not PHDwouldbe m>re useful for large na¬ 
chines t han sc hemes withlinited-directories, or evenwit hout cachi ng. As t udy conpari ng 
t hes e s cheiiES for di ffer ent val ues of t he 1 ocal i t y par an®t er woul d be ver y enl i ght eni ng. W 
bel i eve t hat PHD wi 11 per f or mbes t on 1 ar ge nachi nes wi t h decent hi er ar chi cal 1 ocal i t y, and 
lowprotocol overhead. Wether or not these conditions will occur for real applications is 
unknown. 

6. 4 Surinary 

In t hi s chapt er we us ed an enbedded no del t o showt he per f or nance of t he pr ot ocol napped 

ont o var i ous architectures. WI ooked at average i nt er -1 r ans act i on i s sue tin® and pr ot ocol 
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overheadfor different 1 ocal i ty par anet er s , mil ti threadi ng, and nachi ne sizes. 

W deterninedt hat mil tit hr eadi ng i s onl y us ef ul for appr oxi nat elytwotof our t hr e ads ; 
any addi t i onal i nt er 1 eavi ng does not decrease the overall latency. For snail nachines and 
hi gh 1 ocal i t y appl i cat i ons , thislinitationis due nai nl y t o t he 1 engt h of t he r unni ng t hr e ads 
For 1 ar ge nachi nes wi t h nedi umt o 1 owl ocal it y, this linitationis due nai nl y t o t he hi gh 
protocol overhead. 

W di scovered that t he enbeddi ng wi 11 work veil given fast protocol processi ng ti ne 
and relativelyfewreferences to shared me nor y. In t he best case onl y 9% of all cycl es are 
t aken up by pr ot ocol overhead f or snal 1 nachi nes with0.9 1ocality. Thi s increases to 28% 
f or 1 ar ge nachi nes (32768 process or s) wi t h hi gh 1 ocal i t y, and 39%f or snal 1 nachi nes wi t h 
poor 1 ocal i t y. 

Wth the use of separate cache controllers, we can do even better. For a locality of 
0.9, we can reduce t he overhead t o l%over head for up to 32768 processors. For a locality 
of 0, we can see as little as 4% overhead for 64processors, rising r api dl y as t he nunber 
of processors increases. The gains fromthe addition of these controllers, however, are not 
1 ar ge enough tojustify har dwar e whi ch i s n»re expens i ve t han pr oces s or s . In no case does 
t he addi t i on of t he cont r ol 1 er s s ave nor e t i ne t han doubl e t he nunber of pr oces s or s . 



Ch a p t e r 7 


Concl us i on 

7. 1 Surmary 

This thesis described the Protocol for Hierarchical Ilrectories, a hi erar chi cal, director 
based cache coherence schene. PHD supports read, write, and test-and-set operations. 

Bead requests are sati shed i n the snallest subtree contai ni ng both the requester and a 
copy of t he r eques t ed bl ock; onl ythree sets of ness ages are s ent up or down t hat subt r ee. 
Wite requests are confined to the subtree c ont ai ni ng t he lowest coimon ancestor of the 
r eques t er and al 1 copi es of t he r eques t ed bl ock; f our sets of ness ages are s ent up and down 
the hierarchy, two of which fan out to all nodes with copies. Tfest-and-set requests are 
inplenented as an optinized conbi nation of read and write requests, and i npl enent a 
t es t - and-1 es t - and- s et operation. 

An enbeddi ng of PHDi nt o k-ar y n-cubes was al s o pr opos ed and eval uat ed. The nap- 
pi ng transl ates hierarchical 1 ocal i t y i nt o physi cal locality. The nappi ng al s o di s t r i but c 
higher level tree nodes over nany physical processors, both to i ncrease bandwi dt h and t o 
prevent bottlenecks at the top of the tree. 

W bui 11 a si mil at or to exper i nent wi t h PHD The si mil at or i npl enent s t he f ul 1 pr o- 
tocol plus cer t ai n ext ensi ons , such as local al 1 ocat i on and opt i onal aut onat i c al 1 ocat i on or 
uni ni t i al i zed dat a. The si mil at or is t r ace - dr i ven, and can gat her nany t ypes of statisticsfor 
s t udyi ng t he pr ot ocol. The si mil at or has beenusedtotest theprotocol; addi t i onal f eat ur es 
for debugging include printing events and cache enptying events. Aspecial verification 
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programwas al so desi gned to ensure that the protocol kept the nem>ry consi stent. 

This thesis describes two analytical models: an abstract one and an enbedded one. 

The abstract model characterizes aspects of the protocol which are not dependent on the 
ar chi t ect ur e on whi ch t he pr ot ocol is run and can be us ed to eval uat e ot her hi er ar chi cal 
protocols. The enbedded model describes the behavior of PHD as it interacts with a 
nachi ne whi ch has parti cul ar net work and pr oces s or char act eri s t i cs . The enbedded model 
derives its inputs fromthe outputs of the abstract model. 

7. 2 Gbnt r i but i ons 

PHDis s cal abl e i n cos t and net wor k 1 at ency. fill i ke other hi erar chi cal protocols, there is 
no bottl eneck at the top of the hi erarchy. The protocol uses fewer hi erarchy traversal s and 
a shor t er cr i t i cal pat hto satisfyread ope rat i ons t han do ot her hi erar chi cal pr ot ocol s . The 
protocol supports asynchronous i nval i dat i on t hr ough t he not i on of ownership. 

W pr opos ed a met hod of expr es si ng localityin appl i cat i ons napped ont o hi erar chi cal 
ar chi t ect ur es and successfullyusedthis model topredict the average hei ght per r eques t, t he 
average 1 ongest path per request, and the average nunber of messages sent per request. W 
us ed t hr ee appl i cat i ons i n or der to val i dat e t hi s abs t r act model: a uni f or mr ef er ence s t r eaiq 
a r el axat i on al gor i t hip and a clustering dat a-reference s t r earn Af t er val i dat i ng t he model, 
we enpl oyed it i n t he pr edi ct i on of t he behavi or of t he pr ot ocol on ver y 1 ar ge hi erar chi es , 
studyi ng howthe model resul t s vari ed wi th nachi ne size and 1 ocal i ty. 

This abstract model was used to generate the inputs t o an enbedded model; the em 
beddedmodel descri bed howthe protocol runs when napped ont o par t i cul ar nachi nes . W 
looked at average i nter-transacti on i ssue ti ne and protocol overheadfor different locality 
paraneters, degrees of mil ti threadi ng, and nachi ne sizes. 

The enbeddi ng per f or ns well when t he run 1 engt hbetweenreferences to shared neiror y 
i s at least an order of nagnitude less than the tine spent to process a protocol state 
t r ansi t i on. If s e par at e cont r ol 1 er s for pr oces si ng prot ocol r eques t s ar e i ncl uded, t he pr ot o 
scales to32kprocessor nachi nes as 1 ong as appl i cat i ons exhi bi t hi erar chi cal 1 ocal i t y: at lea 
22%of t he gl obal references mis t be abl etobe satisfiedlocally; at most 35%of t he gl obal 
references are all owed t o r each t he t op 1 evel of t he hi erar chy. Wt hout t he us e of s e par at e 
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cont rol lers, latency cannot be hi dden effect i vel y by mil tit hreadi ng becaus e proces s or s spend 
too mich of t hei r t i me s at i sf yi ng prot ocol reques t s . 

7. 3 n scussi on 

This thesis has exposed several naj or areas of research which should be pursued. The 
tradeoffs i nvol ved i n desi gni ng good hi erar chi cal cache coherence protocols shouldbe char¬ 
acterized. The abstract nodel of the protocol woul d benefit froma better understanding 
of t he 1 ocal i t y paranet er. Wt h s one addi t i onal work, t he enbedded nodel can be us ed t o 
aid in the design of shar ed-neimry nachi nes . 

Thi s the si s di s cussed son® of the deci si ons whi ch were nade i n the desi gn of a hi erar - 
chi cal cache coherence system The effects of these decisions have not been f ul 1 y expl ored. 

A conpar i s on of PHD and anot her hierarchical cache coherence protocol would still be 
i nstructi ve. 

Current 1 y, t he 1 ocal i t y par an®t er set can onl y be det er ni ned f or an appl i cat i on by si mi- 
1 at i on. W have perf orn®d son® i ni t i al work t owar ds deri vi ng t he set of 1 ocal i t y par an®t er s 
froma spatial 1 ocal i t y nodel of an appl i cat i on, such as that available for the cluster ap¬ 
plication. The der i vat i on works best, however, for applications whi ch exhi bi t a very high 
degree of cl ust eri ng. Mire work needs t o be done i n thi s area. 

Al 1 of our 1 ar ge nachi ne s t udi es use a flat set of locality par an®t er s t o cons t r ai n t he 
s t udy space. Thi ng a flat set of locality par an®t er s , however, i s not necessarilyrealistic. Fo 
1 ar ge appl i cat i ons r unni ng on nas si vel y par al 1 el nachines, we ni ght expect 1 es s sharing to 
occur near t he t op of t he hi erar chy, and nore at t he hot t om Al t hough f ewl ar ge appl i cat i ons 
exi s t t oday, as ones are writtentheycanbe st udi edinorder todeterninereas onabl e locality 
paran®t er s. 

For applications whi ch have a large variance in the types of dat a r ef er enci ng, several 
sets of 1 ocal i t y par an®t er s can be used, t o avoi d aver agi ng effect s . This woul d al 1 ow one 
t o s epar at e wi del y shared dat a such as s ynchr oni zat i on var i abl es f r oml ess us ed ones . Thi s 
s epar at i on woul d be us ef ul becaus e an appl i cat i on nay s t al 1 due t o s ynchr oni zat i on i ns t ead 
of nor nal changed dat a. Thi s n®t hod ni ght al s o pr ovi de newi nsi ght i nt o t he i nt er act i ons 
between shared data and programexecuti on- tin® behavi or. 
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Qiapt er 6 shows t hat t he nai n 1 i iri t at i on of usi ng t he nappi ng propos ed inthis t he sis 
to enbed the protocol to a k-ary n-cube i s the potenti al 1 y hi gh protocol overhead. Several 
ways to fix this pr obi emshoul d be explored. Che is to build fast controllers whi ch can 
i nde pendent 1 y pr oces s t he pr ot ocol r eques t s , t he approach of the MTAlewifeproject [3]. 
The cost of addi ng such a cont r ol 1 er t o t he nachi ne nus t be bal anced agai ns t t he pot ent i al 
speed benefit s. 

None of t he ar chi t ect ur es s t udi edinthis t he sis was 1 i m t ed by t he speed of t he net work. 
Thi s i ndi cat es t hat either there is aprocessor-net work speed m snat ch, and t hat t he net work 
istoofast, ort hat t he pr ot ocol is f undanent al 1 y t oo si ow. St udi es usi ng a ver y f as t cont r ol 1 e 
withfast processors, or fast cont r ol 1 er s and slowprocessors coul d be us ed t o eval uat e how 
t he pr ot ocol per f or nance i s affect ed by t he 1 ayout, and pos si bl y howt o bui 1 d shared- neiror y 
nachi nes. 

There are nany areas left to be explored. Uppermost i n our ninds is the question of 
whether or not hierarchical protocols will per f or mbet t er than flat di r ect or y s chenes , or 
even no cachi ng at al 1, for actual applications. Eeternini ng exactl y where the tradeoffs are 
i n conpl exi t y, technology, appl i cat i on 1 ocal i t y, conpi 1 ati onti ne, and nachi ne si ze woul d be 
extrenely enli ght eni ng. W bel i eve t hat PHDwi 11 per f or mbes t on 1 ar ge nachi nes wi t h 1 ow 
protocol over head runni ng appl i cat i ons exhi bi ti ng hi erar chi cal 1 ocal i t y pat t er ns . Wether 
or not these conditions will occur for real applications is unknown. 

Eegardless of what cache coherence schene is chosen conpi 1 ers mist be developed to 
ni ni ni ze dat a shari ng. The s chedul i ng of pr oces s es and t he pi acenent of dat a wi 11 be s one 
of t he nos t i npor t ant pr obi errs i n bui 1 di ng nas si vel y par al 1 el conput er s ys t errs . 



Ap p e ndix A 


No me n c 1 


a t u r e 


B 


Ave rage mss age size (inflits). 


C 


Nunbe r of war ds i n a c ac he 1 i ne. 


Ei 


Probabi 1 i ty a node accesses bl ocks owned by a node i n i ts subfriradef .1 


Nunber of 1 evel s i n the hi erarchy. 




Average time to satisfy a locally sati sfiabl e request to shared memory. 


M r 


Average ti me to process a protocol ms s age i nvoked on a processor. 


N 


Nunber of processors. 


Ni 


Average network i nt erf ace overhead 


O 


Overhead: Fraction of the average transacti on i ssue tim not spent running. 
Average thread run 1 ength between successi ve requests to shared memory. 


R 


Ri 


Nunber of reads whi ch reach 1 evel l. 


Non-net work overhead to sati sfyi ng a non- 1 ocal shared- memory request 


Average thread run 1 ength between successi ve non- 1 ocal 1 y sati sfiabl e 


requests. 


Context switch tim. 


Wi 


Nunber of writes whi ch reach 1 evel l 




Z\ Eracti on of 1 ocal 1 y sati sfiabl e {reads , wti t es , reques ts } to shared m 


tor y. 


Thbl e A 1: Part I of t he t abl e 1 i s t i ng al 1 of t he par anet er s us ed by t he t he si s . Part Ilis 
1 ocated on the next page. 
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Ill 


b 

Branchi ng factor of the hierarchy. 

Cl 

Nunber of val i d chi 1 dren of a val i d node at level l . 

Cr) C 

Average nunber of messages in critical path of a non-local shared- memory r< 

d 

Average di stance a message travels (in hops). 

di 

Expe c t e d nunbe r of hops bet neen a node and its physi cal 1 y di sti net parent. 

d'i 

Expected nunber of hops a message whi ch ci rcunvents the hi erarchy 11 take 

e 

Er act i on of al 1 oper at i ons by a node whi ch occur t o i t s own dat a. 

f 

Nunber of flits per word. 

Qri 9 w 9 

Average nunber of messages per non-local shared-memory {read, write, reque 

h<p j ^ h 

Average he i ght a {read, write, request} i s expected to reach. 

k 

Nunber of processors per dimension. 

kd 

Average di stance a message travel sin each di mensi on. 

f>T ) vl) ^ 

Longest path traversed duri ng a {read, write, request}. 

m r , r&, m 

Average nunber of messages sent during a {read, write, request}. 

n 

Nunber of dimensions. 

pi 

Val ue of t he 1 ocal i t y par amet er at 1 evel l . 

Pm 

Efegree of hardware mil ti threadi ng. 

r 

Eractionof reads inthe shared-memory ref erence stream 

ti 

Probabi 1 i ty of taki ng a val i d path down froml evel l duri ng node sel ecti on 

U 

The average i nter-transact i on issue time. 

vf 

Probabi 1 i ty that c chi 1 dren of a val i d node at 1 evel l are valid. 

w 

Eractionof writes inthe shared- memory reference stream 


Thbl e A 2: Part II of t he t abl e 1 i s t i ng al 1 of t he paranet er s us ed by t he t he si 



Ap p e n d i x B 


Re 1 a xa ti on Cal cul ati ons 


The height of read and write operations for a gi ven rel axati on probl emcan be exactly 
calculated, as nent i oned i n Qiapt er 5. This appendi x present s the analytical equations for 
two and three dinensional rel axati on cal cul ati ons . 

Cal c ul at i ng t he Char ac t eristics of Read Ope r at i ons The fir s t char act eri s t i c of 
appl i cat i on t hat mis t be under s t ood i s t he nunber of read ope rat i ons t o nei ghbor i ng val ues 
that are 1 ocal, and the nunber that cross vari ous 1 evel s of the hi erarchy. W wi 11 first show 
a deri vati on f or the 2- di nensi onal nunbers and then the 3-di nensi onal nunbers. Call n 
t he nunber of pr oces sor s per di nensi on, jVt he t ot al nunber of pr oces sor s , x t he nunber of 
data points per processor per dimension, and Xthe nunber of data points per processor. 

Nat e t hat i n t he r es t of t he t he si s , ki s t he nunber of pr oces s or s ; we us e nher e f or si npl i ci ty. 

As can be seen i n Fi gure B. 1, the read references whi ch reach the hi ghest 1 evel i n the 
s ys t eiq _g_i , wi 11 be t he ones by poi nt s of dat a abut t i ng t he bol des t 1 i nes , whi ch r epr es ent s 
t he di vi si on bet ween t he f our level 3processors. The nunber of reads whi chcross these lines 
is 4 m The nunber of reads whi chcross the next highest level is 4(2nc). In general, the 
nunber of reads whi ch cr os s 1 evel l is t wi ce as nany as cr os s 1 evel Z+l, for all l £ [ 1, i-2] . 

For t hr ee di nensi ons , t he read ref er ence cal cul at i on i s si ni 1 ar. Her e we are deal i ng wi t h 
planes insteadof lines. The nunber of reads whi chcross the hi ghes t 2 pl 2 anAsi Bn6n 
the 2- di nensi onal case, the nunber of reads whi ch cross 1 evel l is t wi ce as nany as cross 
level Z+l, for all Z £ [ 1, Z—2] . 
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-Level 1 Crossings 

-Level 2 Crossings 

-Level 3 Crossings 

Level 4 Crossings 


j Level 0 Node 
I Level 1 Node 


Level 2 Node 


□ 

□ 


Level 3 Node 


Level 4 Node 



Fi gure B. 1: The tree i s napped to the processors i n such a way that crossi ng the bol der 
1 i nes represent s reachi ng hi gher 1 evel s of the tree. 


In fact, we can per f or mt hi s cal cul at i on f or an arbi trary ddi mensi onal enbedding. The 
L-l cr os si ng happens f or exact 1 y 2c(%) 4 reads , where D The reads al ways doubl e 
as the level decreases. The nunber of read ref erences reachi ng each 1 evel for an arbitrary 
di mens i on i s s uimar i zed i n Equat i on B. 1. 


Ri 


IM- Etel R h 1= 0 
< l E [ 1, X—2] 

2c( uc) 2<i 2 l =L— 1 


(B.1) 


Cal c ul at i ng t he Char ac t eristics of Wr i t e Ope r at i ons W ar e nowpr epar ed t o cal ci 
1 ate the exact nunber of wri tes whi ch mist reach a parti cul ar hei ght. Instead of sunning, 
for every poi nt, t he hei ght s of its nei ghbor s , we mis t per f or ma naxi mim As can be seen 
in Figure B. 2, there are nany grid points that have neighbors at varying heights. Ehta 
poi nt a is at ypi cal dat a poi nt, wi t h al 1 of its nei ghbor s 1 ocal. The naxi mimhei ght a wri t e 
to t hi s poi nt coul d reach, t her ef ore, is 0. Eht a poi nt 6 has a nei ghbor whi ch i s acr os sal evel 
1 boundary. Si nee the rest of its nei ghbor s are 1 ocal, the naxi mimhei ght i s 1. Both poi nt s 
cand dhave nei ghbor s across level 2 boundar i es , so mis t be count ed at hei ght 2. The t hr ee 
poi nt s e, /, and gar e si ni 1 arl y count ed at hei ght 3, and h, i, j, and fear e si ni 1 arl y count ed 
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-Level 1 Crossings 

-Level 2 Crossings 

-Level 3 Crossings 

Level 4 Crossings 


_1_ 

1 

1 

_i_ 

1 

r^r'^ 
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i 

c r 

_i_ 

d J 

— 

— 

— 

— 

L 

□ 

a 

n b 



L 






^ r 



~ r 


n e 

i f 

n 8 


_1_ 

i 

_1_ 

i 

Jh 

Vr- 

_1_ 




— 

_1_ 

i 

_1_ 

i 

Jj 

i 

_1_ 

k r 

— 

— 

— 

— 











Fi gure B. 2: The 1 abel ed poi nt s re pres ent pi eces of dat a whi ch mis t be car ef ul 1 y consi der ed 
when det er ni ni ng t he hei ght t hat awrite tot hat dat a wi 11 reach. 


at 4. 

M 1 poi nt s t o be count ed at 1^1 can be easi 1 y cal cul at &£[ as=FHc—4. The 4 m 
ms der i ved f or t he r ead cas e, and t he subt r act i on of 4refers tot he f our cr os s poi nt s each of 
whi ch ms doubl e counted i n the read forhuDb cal cul ate the f ormil a for l £ [ 1,Z—2] , 
m count all of the points on a cross for size l, subtracting out the four center ones as 
i n the l = L— 1 case, and then mil ti pi y that quanti ty by the nunber of crosses at that 
1 evel. W t hen mis t subt r act off al 1 poi nt s whi ch ar e suppos ed t o be count ed as hi gher-1 evel 
poi nt s . The r esul t i ng f or mil a, whi ch appl i es onl y t o t he t wo- di nensi onal case, is gi ven i n 
Equation B. 2. Nate that;,(7the nunber of cr os si ng poi nt s as s oci at ed wi t h each 1 evel, is 
~ l , and Q, the nunber of processors at level .D 


Wi 


n 2 ( x— 2) 2 +4n( x— 2) +4 1=0 

< G / (^-4)-8(C / )(C7-l) le[l,I= 2 \ 

Am —4 l =L— 1 


(B.2) 


Gal cul at i ng t he nunber of dat a points whi ch have conpl et el y 1 ocal neighbors is fairly 


1 Thi s double count is appropriate for a read, since more than one read occurs to every block. 
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si npl e. Nat e t hat we are as sum ng t hat poi nt s whi ch 1 i e on t he boundary are read fewer 
tines (as nany fewer tines as neighbors they lack). There ar $ (:»irj2L)et el y 1 ocal 
points per node, plus x— 2 boundar y poi nt s per edge node, plus an extra point (the corner 
one) on each corner node. 

In the 3-di nensi onal case we mist consider three intersecting planes instead of two 
i nt er s ect i ng 1 i nes . The nunber of writes whi ch reach the highest level, however, is still 
straightforward: fromthe read case we know that ther^aaS’erfossi ngs of the three 
planes. The three planes intersect at three separate lines, each of whi ch generates four 
doubl e-count ed gri d uni t s per line unit which mist be subtracted fromthe earlier total. 
The t hr ee 1 i nes , however, intersect in one poi nt whi ch has ei ght doubl e- count ed gr i d uni t s 
around i t. These ei ght gr i d uni t s mis t be added back t o t he t ot al, r esul t i ng i n t he f or mil a 
gi ven i n Equat i on R 3. 


[ n 3 ( x— 2) 3 +6n 2 ( x— 2) 2 + 12n( x— 2) +8 l =0 

Gi ("- ^f+ 8 ) -24^ i(Q-l) 

+24 (Cf( Q — 1) ) + ((Ci —1) 2 Ci) l 6 [ 1, L— 2] 
I Qn 2 x 2 — 12nc+8 l =L— 1 


(R3) 


W nowcal cul ate the equati on for l 6 [ 1, X— 2] for the 3- di nensi onal case. H rst we 
consi der each 1 evel / subunit. Ther e ^te^-Guch subuni t s . As i n t he l = L— 1 cas e 
we count all the points along the three planes, subtract off the line ones, and add back 
in the center eight ones. Wnowmist account for the all the points whi chare counted 
at a higher level. These points are the ones at the boundaries of the subunits. W wi 11 
calculate these fromlooking at the whole cube, not at subunits. Consider a face of the 
cube, as inli gur e R 3. Each dot ted-line cross inthe ri ght si de of t he figure i s t he edge vi ew 
of t he 3- di nensi onal obj ect shown at t he left si de of t he figure. The bol dlines onthe left 
figure indicate which points have been doubl e-count ed, and mist be subtracted out of the 
t ot al. Each di anend and circle inthe ri ght figure represents aline t hat mis t be subt r act ed 
out. There are;(T7;—1) circle lines, and the exact sane nunber of di ammd 1 i ne s, per 
di nensi on. For every circle or di anend 1 i ne, 4ncpoi nt s mis t be subt r act ed out. 
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Fi gure B. 3: Each dot tedline cross is the edge of t he i nt er s ect i on of t wo pi anes . 
boundari es of these 3- d cros ses are 1 i nes (the endpoi nt s of whi ch are narked as ci r c 
di ammds ) whi ch cont ai n t he poi nt s that are supposed to be countedat a hi gher level 



Fi gure B. 4: The circles and di ammds r epr es ent t he endpoi nt s of 1 i nes . The i nt er s ec 
t hese 1 i nes have been doubl y subt r act ed i n our t ot al, and mis t be added back. 


At the 
1 e s and 


i ons of 
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After subtracting out all of those points, we still do not have the correct equation. 
Everywhere the circle and di ammd lines intersected, we doubl e- subt r act ed poi nt s , and we 
mist nowadd themback. Consider Figure B. 4. Study the front and top faces of the cube. 

W can s ee t hat per front col uim of ci r c) e-sl, ftFont circles inter s/e-et Cbp circles. 

There are ()7front colunns of circles. Sinilarly, per front colunnof dj imHnmils, C 
di anonds i nt er s eel; <2>p di anonds . There are; G1 front col uims of diamonds. This set 
of intersecti ons occur s once for every pai r of di mensi ons (i.e. three times), and gener at es 
eight points to be added back per crossing. The resul ti ng f ormil a was al ready shown i n 
Equat i on R 3. 

W now cal cul at e t he number of dat a poi nt s whi ch have conpl et el y 1 ocal nei ghbor s f or 
the 3-di nensi onal case. There are (§;-e8^rpl et el y 1 ocal points per node, plus \x— 2) 
boundary points per face of the cube node, plus x—2 boundary points per edge node, plus 
an extra point (the corner one) on each corner node. Again, the resulting f ormil a was 
al r e ady s hown i n Equat i on R 3. 



Ap p e n d i x C 


Ta ble of Protocol Be ha v i 


The f ol 1 owi ng s ect i ons det ai 1 t he behavi or of t he Pr ot ocol for H er ar chi cal El r ect or i es . The 
first describes the transitions for leaf nodes, and the s econd des cr i bes the transitions fo 
parent nodes. 

Cl Leaf Node T3* ansi t i on Tkbl e 

The leaf node transitions are a function of the current state and the input message. For 
every such conbi nat i on, t here i s a pos si bl e new s t at e to t r ansi t i on t o as wel 1 as a pos si bl e 
mess age t o s end. The pos si bl e s t at es are enumer at ed i n Thbl e G 1. Thbl e C21ists all of the 
mess ages t hat can be r ecei ved by a 1 eaf node. These mess ages are expl ai ned in Thbl e 2. 4. 

Thbl e G 3 expl ai ns all of t he s ynbol s us ed i n t he t r ansi t i on t abl e. The mess ages whi ch can 
be sent by a 1 eaf node are 1 i st ed i n G 4, and a f urther expansi on of the abbrevi ati ons i s 
listedin Thbl e G 7 . The act ual t r ans i t i on t abl e is s pi i t ont o t wo page sin Thbl e G 5. 
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Synbol 

Expans i on 

i nval i d 

i nvalid 

r _yo_npl 

r eadabl e.yowner 

r _no_npl 

readabl e_nowner 

wf r _no_npl 

mi t i ng_f or _r ead 

w_yo_npl 

wr i t able 

wf w_no_npl _nr 

mi t i ng_f or _wri t e_nowner mpl _nr ead 

wf w_no_npl_yr 

mi t i ng_f or _wri t emowner jipl _yr ead 

wf w_yo mpl 

mi t i ng_f or _wri t e.yowner _npl 

wf w_no_ypl _nr 

mi t i ng_f or _wri t e_nomer _ypl _nr ead 

wf w_no_ypl_yr 

mi t i ng_f or _wri t e_nomer _ypl _yr ead 

wf m_yo_npl 

mi t i ng_f or _wri t e_ok_yowner jipl 

wf m_yo_ypl 

mi t i ng _f or _wr i t e _ok _yowne r _ypl 

wf wv_no_ypl _nr 

mi t i ng_f or _wr i t e _val ue _nowier _ypl _n 

wf wv_no_ypl _yr 

mi t i ng_f or _wr i t e _val ue _nowner _ypl _y 

wf t _no_npl_nr 

mi t i ng_f or _t as _nowner jipl _nr ead 

wf t _no_npl_yr 

mi t i ng_f or _t as _nowner jipl _yr ead 

wf t _yo _npl 

mi t i ng_f or _t as _yowner jipl 

wf t _no_ypl_nr 

mi t i ng_f or _t as _nowner _ypl _nr ead 

wf t _no_ypl _yr 

mi t i ng_f or _t as _nowner _ypl _yr ead 

wf t o_yo_npl 

mi t i ng_f or _t as _ok_yowner jipl 

wf t o_yo_ypl 

mi t i ng_f or _t as _ok_yowner _ypl 

wf t v_no_ypl _nr 

mi t i ng_f or _t as _val ue _nowner _ypl _nr e 

wf t v_no_ypl _yr 

mi t i ng_f or _t as _val ue _nowner _ypl _yr e 


ead 


Thbl e G 1: The abbr e vi at i ons for states us edint he leaf tr ans i t i on t abl e. 


Synbol 

Expans i on 

dr 

read request 

r 

read 

r d 

read.dat a 

dw 

wri te reques 

1 

1 ock 

s wo 

s _wri te.own 

wo 

wr i t e _ok 

dt 

tas request 

r t 

r ead_t as 

tf 

t as _fai 1 ed 


Thbl e G 2: The abbreviations for input ness ages and requests used in the leaf transition 
t able. 
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Synbol 

Expans i on 


Stay in the sane state. 

NUMBER 

Gbto the state nunbered NUMBER. 

D 

Put nessage i n del ayi ng queue. 

X 

&ror. 

DX 

Xi f we i ssue onl y one request at a ti ne, Dothe: 

z: ACTI ON 

If val ue of the bl ock is zero then ACTI ON. 

nr q: ACTI ON 

If current node ori gi nat ed the request then AC r 

!: ACTI ON 

Use ACTI ON. 


Thbl e G 3: The abbr e vi at i ons f or s ynbol s usedinthe leaf tr ans i t i on t abl e. 


Synbol 

Expans i on 

f r 

Send flcfr up to parent. 

r f r 

Send rflcfr up to parent. 

f w 

Send flcf wup t o parent. 

a 

Send a up t o parent. 

al 

Send al up t o parent. 

ft 

Send flcft up to parent. 

r f t 

Send rflcf t up t o parent. 

cv 

Send cv up t o parent. 

r d 

Send rd to the reader. 

s wo 

Send s wo the wri t er. 

tf 

Send tf to the tas requesi 


Thbl e G 4: The abbr evi at i ons f or out put ness ages us ed i n t he 1 eaf t r ansi t i on t abl e. 
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0 i nval i d 

1 r_yo_npl 

2 r_no_npl 

3 wfr_no_npl 

4 w_yo_npl 

5 wf w_no_npl _nr 


dr r r d dw 

fr rfrX fw 

_3_._5_ 

r d X f w 

_._7_ 

r d X f w 
6 

DX rfr cv DX 
2 

rd X 
1 

dx FfF^K dxT 


nr q: X 
! : a . 
swo a 
0 
a 
0 
D 


6 wf w_no_npl_yr DX rd X DX 

7 wfw_yo_npl DX rd X DX 

8 wf w_no_ypl _nr DX rfr X DX 

9 wfw_no_ypl _yr DX rd X DX 

10 wfwo_yo_npl DX rd X DX 

11 wf\ro_yo_ypl DX rd X DX 

12 wfwv_no_ypl _nr DX rfr X DX 

13 wfwv_no_ypl _yr DX rd X DX 


s wo wo dt 

X I X I ft 

14 

~~x x FT 

16 

~~x x FT 

15 

~~X X x“ 


s wo a 

X 

X 

0 



nr q: al 


X 

8 

10 


! : a . 



nr q: al 


X 

9 

10 


! : a 5 



nrq: sto al 


X 

9 

! : sto a 5 

10 


nr q: X 

11 

12 

! : D. 



nr q: X 

11 

13 

! : D. 



nr q: al 

X 

X 

11 



! : X 



nr q: X 

X 

4 

! : D. 



nr q: X 

4 

X 

! : D. 



nr q: X 

4 

X 

! : D. 




rt ti 
rft X 

z: f w. X 
! : tf 

z: f w. X 
! : tf 
rft X 

z: f w. X 
! : tf 
rft X 


rft I X 


rft X 


rft I X 


rft X 


rft I X 


rft X 


rft I X 


rft X 


Ikbl e G 5: The t r ansi t i on t abl e for 1 eaf nodes . 
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1 

s wo 

WO 

dt 

r t 

tf 

nr q: al 

17 

! : a . 

19 

X 

DX 

r f t 

0 

nr q: al 

18 

! : a 14 

19 

X 

DX 

r f t 

0 

nrq: sto al 
18 

!:s wo a 14 

19 

X 

DX 

r f t 

0 

nr q: X 

! : D. 

20 

21 

DX 

r f t 

0 

nr q: X 

! : D. 

20 

22 

DX 

r f t 

0 

nr q: al 

20 
! : X 

X 

X 

DX 

r f t 

X 

nr q: X 

! : D. 

X 

4 

DX 

r f t 

X 

nr q: X 

! : D. 

4 

X 

DX 

r f t 

X 

nr q: X 


X 

DX 

r f t 

X 
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G2 Rirent N)de Ttansi t i on Tkbl e 

The s t at e of a parent (non-1 eaf ) node i ncl udes t he f ul 1 vect or des cri bi ng i t s chi 1 d subt r ees 
For t hi s re as on, t he t r ansi t i on t abl e mis t be col 1 aps ed (j us t ei ght subt r ees i ncr eas es t he t ot a 
nunber of states by a factor Wf) 2n order to express i t i n a reasonabl e amount of room 
The t abl e t he ref ore t akes t hr ee i nput s : t he cur r ent s t at e ( whi ch does not i ncl udes t he s t at e 
of t he s ubt r e e vector), t he s ubt r e e vector c onbi nat i on, and t he i nput ness age. Inres pons e 
to a message, a node nay send a message, per f or man act i on, change i t s own s t at e, or any 
conbi nat i on of t he above. M 1 of t hes e res pons es nay be modi fie d by a condi t i onal expr es si on 
further specifying the state of the node. The table additionally contains assertions about 
t he s t at e of a node f or s one of t he ent r i es . These as s er t i ons are not r equi red to i npl enent 
the protocol, but are useful to understand what mist be happening when a node reaches a 
parti cul ar state. 

The list of states is enumer at ed i n Thbl e 2.3. The possible vector conbinations are 
listed in G6. The mess ages t hat a parent node m ght receive are listed in Thbl e G 7; an 
expl anat i on of t hes e mess ages i s i n Thbl e 2. 4. The list of act i ons a node nay per f or mi s 
enumer at ed i n Thbl e G 8. Thbl eG91iststhemess ages t hat m ght be s ent by a parent node. 

Thbl e G 10 expl ai ns all of t he as s er t i ons and pr edi cat es us ed i n t he t r ansi t i on t abl e. The 
act ual node t r ansi t i on t abl e spans mil t i pi e pages , and is referred to as Thbl e G 11. 


Synbol 

Expans i on 

c 

v0_w0_cX 

All subtrees are either confirmed or invalid. 

V 

vX_wO _c 0 

All subtrees are either valid or invalid. 

vw 

vX_wX_c 0 

All subtrees are valid, uniting, or invalid. 

VC 

vX_wO _c X 

All subtrees are valid, confirmed or invalid. 

vwc 

vX_wX_c X 

All subtrees are valid, uniting, confirmed or ir 


Thbl e G 6: The abbreviations for the vector conbinations used in the parent transition 
t able. 



124 


APPENDI X C. TABLE OF PROTOCOL BEHAVI OR 


Synbol 

Expans i on 

flcf r 

find A owe s t _c oimon jf or _r e ad 

r flcf r 

r edi r ect edJindJ owes t _coimronjf or jre 

r 

read 

flcf w 

find A o we s t _c onmm_f or _wr i t e 

1 

1 ock 

a 

ack 

al 

ackl 

t a 

throwi ng^away 

ct e 

change _t o_excl usi ve 

flcf t 

find A o we s t _c oimon jf or _t as 

r flcf t 

r edi r ect edJindJ owes t _coim®n_f or _t a 

cv 

c onfir mval ue 

r d 

read.dat a 

uncv 

unc onfir mval ue 

r t 

r ead_t as 

wo 

wr i t e _ok 


Thbl e G 7: The abbrevi ati ons f or i nput messages used i n the parent transi ti on tabl e. 


Synbol 

Expans i on 

L 

Lock thi s node and change the wri ter in 


Change the sending subtree to valid. 

+c 

Change the sending subtree to confirmed 

A 

Change the sending subtree to invalid. 

+w 

Change the sending subtree to waiting. 


Change t hi s node ’ s s t at us t o excl us i ve 

-Hs 

Change this node’s status to shared. 


Thbl e G 8: The abbrevi ati ons for acti ons used i n the parent transi ti on tabl e. 
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Synbol 

Expans i on 

flcf r 

Send flcf r up to parent. 

r flcf r 

Send rflcfr up to parent. 

r 

Send r down to randonly chosen confirned subtree. 

flcf w 

Send flcf wup to parent. 

1 

Send 1 down to wri ti ng subtree. 

a 

Send a up t o parent. 

al 

Send al up t o parent. 

t a 

Send ta up to parent. 

ct e 

Send ct e down t o onl y non- i nval i d subt r ee. 

flcf t 

Send flcft up to parent. 

r flcf t 

Send rflcft up to parent. 

cv 

Send cv up to parent. 

uncv 

Send uncv up t o parent. 

r t 

Send r t down t o r andoid y chosen confirned subt r ee. 

wo 

Send wo down t o onl y non- i nval i d subt r ee. 

r dav, c 

Send r d down t o al 1 valid, naki ng t hemconfir ned. 

r davw, c 

Send r d down t o al 1 val i d or waiting, naki ng t hemconfir n; 

r daw, c 

Send r d down t o al 1 waiting, naki ng t hemconfir ned. 

r davL, c 

Level 1: Send r d down t o al 1 val id except locker, 
naki ng t hemconfir ned. 

Level A: Send r d down t o al 1 val i d i ncl udi ng 1 ocker, 
naki ng al 1 but locker confirned. 

r davwL, c 

Level 1: Send r d down t o al 1 val i d or wai t i ng except 1 ocx 
naki ng t hemconfir ned. 

Level A- Send r d down t o al 1 val i d or wai ti ng i ncl udi ng 1 
naki ng al 1 but locker confirned. 

r 1 

Send r down t o t he 1 ocki ng subt r ee. 

1 a 

Send 1 down t o al 1 not - i nval i d subt r ees and t he wri t i ng s 


Tkbl e G 9: The abbrevi at i ons f or out put ness ages us ed in t he parent t r ansi t i on t abl e. 
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Synbol 

Expans i on 

rC 

Sending subtree is not the onl y confirned subtree. 

rV 

Sendi ng subtree i s not the onl y val i d subtree. 

i W 

Sending subtree is not waiting. 

iL 

Sending subtree is not the locker of this node. 

i i 

Sending subtree is invalid. 

i vwc 

Sending subtree is valid, waiting, or confirned. 

i vw 

Sendi ng subtree i s val i d or wai ti ng. 

i vc 

Sending subtree is val i d or confirned. 

i c 

Sending subtree is confirned. 

i v 

Sending subtree is valid. 

ii 

Sending subtree is the locker of this node. 

S VC 

Subtree of requester if operationis val i d or confirmee 

S V 

Subtree of requester of operationis valid. 

s c 

Subtree of requester of operationis confirned. 

lv 

Locker’s subtree is valid. 

1 c 

Locker’s subtree is confirned. 

1 w 

Locker’s subtree is not waiting. 

0c 

Zero subtrees are confirned. 

oc 

At least one subtree is confirned. 

0v 

Zero subtrees are val i d. 

ov 

At least one subtree is valid. 

T 

Thi s node i s not the top 1 evel node i n the hi erarchy. 

t 

Thi s node is the top 1 evel i n the hi erarchy. 

1 

Thi s node i s at 1 evel one. 

a 

Thi s node i s above 1 evel one. 

lvwc 

Exactl y one subtree i s val i d, wai ti ng, or confirned. 

lvw 

Exactl y one subtree i s val i d or wai ti ng. 

lvc 

Exac 11 y one s ubt r e e i s val i d or c onfir me d. 

lv 

Exac 11y one s ubt r e e i s val i d. 

lc 

Exactly one subtree is confirned. 

GRP 

This node is on the request path of the current operati 

NKP 

Thi s node i s not on the request path of the current ope 


on. 


Thbl e G 10: The abbrevi ati ons for asserti ons used i n the parent transi ti on tabl e. 






PARENT NODE TRANS I TI ON TABLE 


FLCFR 
INVALI D 0 

if(t){ 
al 1 oc; 

next .state 2; 

} el se { 
do +v; 
send flcf r; 
next .state 8; 

_} _ 

C S_U_NOP_NGA 1 

assert rQ 
do -tv; 
send r; 

next .state 20; 

C E _U_NOP _NGA 2 
assert rQ 
do -tv; 
send r; 

next .state 21; 


RFLCFR 
I NVALI DO 
assert T) 
assert NRP; 
send rflcf r; 
next .state 


I NVALI D 0 
assert T) 
send rflcfr; 
next .state 


C S-U-NOP-NGAl 

C S_U_NOP_NGA 1 

i f ( orp) { 

send r; 

assert sc; 

next .state . ; 

} 


send r; 


next .state . ; 


c e_u_nop_nga2 

C E_U_NOP_NGA 2 

i f ( orp) { 

do -fs; 

assert sc; 

send r; 

} 

next .state 1; 

send r; 


next .state . ; 



FLCFW 
I NVALI D 0 

if(t){ 

al 1 oc; 

next .state 2; 

} el se { 
send flcf w, 
next .state . ; 

} 

C S_U_NOP_NGA 1 

as s e r t T) 
send flcf w, 
ne xt _s t at e . ; 


C E-U-NOP-NGA 2 
do L; 
send 1 a; 
if(ii){ 
do -tv; 

next .state 24; 
} el se { 
next .state 5; 

} 


C S-L-NOP-NGA 3 

c s_l_nop_nga3 

C S _L -NOP-NGA 3 

C S_L_NOP-NGA 3 

if(ii){ 

i f ( orp) { 

send r; 

send E| 

send E| 
next .state . ; 

} el se { 
assert rQ 
do -tv; 
send r; 

next .state 22; 

} 

C S_L_YOP-NGA 4 

assert sc; 

} 

send r; 
next .state . ; 

next .state . ; 

ne xt _s t at e . ; 

c s_l_yop_nga4 

C S-L-YOP-NGA 4 

C S-L-YOP-NGA 4 

if(ii){ 

i f ( orp) { 

send r; 

send E| 

send E| 
next .state . ; 

} el se { 
do -tv; 
send r; 

next _state 23; 

assert sc; 

} 

send r; 
next .state . ; 

next .state . ; 

ne xt _s t at e . ; 


Tkbl e G 11: The transi ti on t abl e f or parent nodes . 



APPENDI X C. TABLE OF PROTOCOL BEE AVI OR 


C E_L_YOP_NGA 5 

if(ii){ 
send Ej 
next .state . ; 

} el se { 
do -+v; 
send r; 

next .state 24; 

} _ 

C S _L_YOP_YGA 6 

if(ii){ 
send Ej 
next .state . ; 

} el se { 
do -tv; 
send r; 

next .state 25; 

} _ 

C E_L_YOP-YGA 7 

if(ii){ 
send Ej 
next .state . ; 

} el se { 
do -tv; 
send r; 

next .state 26; 

} _ 

V S_U_NOP-NGA 8 

as s e r t r V; 
do -fw; 

next .state 14; 


V S -L-NOP-NGA 9 

if(ii){ 
send Ej 
next .state . ; 

} el se { 
assert T) 
send flcf r; 
next _s t at e . ; 


c e_l_yop_nga5 
i f ( orp) { 
assert sc; 

} 

send r; 
next .state . ; 


c s_l_yop_yga 6 
i f ( orp) { 
assert sc; 

} 

send r; 
next .state . ; 


C E-L-YOP-YGA7 
i f ( orp) { 
assert sc; 

} 

send r; 
next .state . ; 


v s_u_nop_nga8 
assert T) 
i f ( orp) { 
assert sv; 

} 

send rflcf r; 
next .state . ; 
v s_l_nop_nga 9 
assert T) 
i f ( orp) { 
assert sv; 

} 

send rflcf r; 
next _s t at e . ; 


C E-L-YOP-NGA 5 
i f( nrp){ 
send E| 
next .state . ; 

} el se { 
send r; 
next .state . ; 


C S-L-YOP-YGA 6 

send r; 
next .state . ; 


C E _L _YOP _N GA 5 

send Ej 

ne xt _s t at e . ; 


C E-L-YOP-YGA 7 
i f( nrp){ 
send E| 
next .state . ; 

} el se { 
send r; 
next .state . ; 


V S_U_NOP-NGA 8 
assert T) 
send rflcfr; 
next .state . ; 


V S-L-NOP-NGA 9 
assert T) 
send rflcfr; 
next .state . ; 


C S _L _YOP _YGA 6 

send Ej 

ne xt _s t at e . ; 


C E-L-YOP-YGA 7 

send Ej 

ne xt _s t at e . ; 


V S-U-NOP-NGA 8 
as s e r t T) 
send flcf w, 
ne xt _s t at e . ; 


V S-L-NOP-NGA 9 

send Ej 

ne xt _s t at e . ; 
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V S_L_YOP_NGA 10 

se nd E| 
next .state . ; 

} el se { 

i f (t —i 1 ) { 

send rl ; 
next .state . ; 

} el se { 
send flcf r; 
next .state . ; 

n_ 

V E-L-YOP-NGA 11 

se nd E| 
next .state . ; 

} el se { 
send r1; 
next .state . ; 

} _ 

V S_L_YOP_YGA 12 

se nd E| 
next .state . ; 

} el se { 
if(t-i-l){ 
send rl ; 
next .state . ; 

} el se { 
send flcf r; 
next .state . ; 

n_ 

V E-L-YOP-YGA 13 

se nd E| 
next .state . ; 

} el se { 
send r1; 
next .state . ; 

} _ 

VW S _U_NOP_NGA 14 

assert rV; 
assert i W; 
do -jw; 

next _s t at e . ; 


V S-L-YOP-NGAlO 

i f ( orp) { 
assert sv; 

} 

if(t){ 
send r 1 ; 

} el se { 
send rflcf r; 

} 

next .state . ; 


V E-L-YOP-NGAll 

i f ( orp) { 
assert sv; 

} 

send r1; 
next .state . ; 


v s_l_yop_yga12 
i f ( orp) { 
assert sv; 

} 

if(t){ 
send r 1 ; 

} el se { 
send rflcf r; 

} 

next .state . ; 


v e_l_yop_yga 13 
i f ( orp) { 
assert sv; 

} 

send r1; 
next .state . ; 


v w s _u_nop _nga 1 4 
assert T) 
i f ( orp) { 
assert sv; 

} 

send rflcf r; 
next _s t at e . ; 


V S-L-YOP-NGA 10 
send rl ; 
next .state . ; 


V S -L-YOP-NGA 10 

send E| 
next .state . ; 


V E _L_YOP_NGA 11 
i f ( NRP ) { 

send E| 
next .state . ; 

} el se { 
send rl ; 
next .state . ; 

_}_ 

V S_L_YOP_YGA 12 

send rl ; 
next .state . ; 


V E_L_YOP_YGA 13 
i f ( NRP ) { 
send E| 
next .state . ; 

} el se { 
send rl ; 
next .state . ; 

_}_ 

VW S -U-NOP _NGA 14 

assert T) 
send rflcfr; 
next _s t at e . ; 


V E-L-YOP-NGA 11 

send E| 
next .state . ; 


V S -L-YOP-YGA 12 

send E| 
next .state . ; 


V E-L-YOP-YGA 13 

send E| 
next .state . ; 


vw s -U-NOP-NGA 14 
assert T) 
send flcf w; 
next _s t at e . ; 
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VW S _L_NOP_NGA 15 

se nd E| 
next .state . ; 

} el se { 
assert rV; 
do -jw; 

next .state . ; 

} _ 

VWS_L_YOP_NGA 16 

se nd E| 
next .state . ; 

} el se { 
assert i 
do -jw; 

next .state . ; 

} 

VW E_L_YOP_NGA 17 

se nd E| 
next .state . ; 

} el se { 
assert i 
do -jw; 

next .state . ; 

} _ 

V W S _L _YOP _YGA 18 

se nd E| 
next .state . ; 

} el se { 
assert i 
do -jw; 

next .state . ; 

} 

VW E_L_YOP_YGA 19 

se nd E| 
next .state . ; 

} el se { 
assert i 
do -fw; 

next _s t at e . : 


vws_l_nop_nga 15 
assert T) 
i f ( orp) { 
assert sv; 

} 

send rflcf r; 
next .state . ; 


vws_l_yop_nga 16 
i f ( orp) { 
assert sv; 

} 

if(t){ 
send r 1 ; 

} el se { 
send rflcf r; 

} 

next .state . ; 
v w E _L _YOP _nga 1 7 
i f ( orp) { 
assert sv; 

} 

send r1; 
next .state . ; 


VWS_L _YOP _yga 1 8 
i f ( orp) { 
assert sv; 

} 

if(t){ 
send r 1 ; 

} el se { 
send rflcf r; 

} 

next .state . ; 
v w E _L -YOP _yga 1 9 
i f ( orp) { 
assert sv; 

} 

send r1; 
next _s t at e . ; 


vw s _L _NOP _NGA 15 
assert T) 
send rflcfr; 
next .state . ; 


vws -L-NOP-NGA 15 

send E| 
next .state . ; 


VW S _L _YOP _NGA 16 
send rl ; 
next .state . ; 


V W E _L _YOP _NGA 17 
i f ( NRP ) { 
send E| 
next .state . ; 

} el se { 
send rl ; 
next .state . ; 


vw s _L _YOP _YGA 18 
send rl ; 
next .state . ; 


V W E _L -YOP _YGA 19 
i f ( NRP ) { 
send E| 
next .state . ; 

} el se { 
send rl ; 
next .state . ; 

1 


VWS _L_YOP_NGA 16 

send E| 
next .state . ; 


VWE-L-YOP-NGA 17 

send E| 
next .state . ; 


VW S _L _YOP _YGA 18 

send E| 
next .state . ; 


VWE-L-YOP-YGA 19 

send E| 
next _s t at e . ; 
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VC S _U_NOP_NGA 20 
as s e r t r V; 
do -tw; 

next .state 27; 


vc e_u_nop_nga 21 
as s e r t r V; 
do -tw; 

next .state 28; 


vc S-L-NOP-NGA 22 

se nd E| 
next .state . ; 

} el se { 
assert rQ 
do -fv; 
send r; 
next .state . ; 


vc s_u_nop_nga 20 
i f ( orp) { 
assert svc; 

} 

send r; 
next .state . ; 
vc e_u_nop_nga 21 
i f ( orp) { 
assert svc; 

} 

send r; 
next .state . ; 

vc s_l_nop_nga 22 
i f ( orp) { 
assert svc; 

} 

send r; 
next .state . ; 


vc s_u_nop_nga 20 

send r; 

next .state . ; 


vc E-U-NOP-NGA 21 
do 4s; 
send r; 

next .state 20; 


vc s_l_nop_nga 22 

send r; 

next .state . ; 


vc s _u_nop_nga 20 
assert T) 
send flcfw; 
next .state . ; 


vc e_u_nop_nga 21 
do L; 
if(ii){ 
do -tv; 

} 

send 1 a; 

next .state 24; 

vc s _l_nop_nga 22 

send E| 

next .state . ; 


vc S-L-YOP-NGA 23 

se nd E| 
next .state . ; 

} el se { 
do -tv; 
send r; 
next .state . ; 

} _ 

VC E_L_YOP-NGA 24 

se nd E| 
next .state . ; 

} el se { 
do -tv; 
send r; 
next _s t at e . ; 


VC S-L-YOP-NGA23 

i f ( orp) { 
assert svc; 

} 

send r; 
next .state . ; 


VC E-L-YOP-NGA24 

i f ( orp) { 
assert svc; 

} 

send r; 
next .state . ; 


vc s_L_YOP-NGA 23 

send r; 

next .state . ; 


vc E-L-YOP-NGA 24 
i f( nrp){ 
send E| 
next .state . ; 

} el se { 
send r; 
next _s t at e . ; 


vc S_L_YOP-NGA 23 

send E| 

next .state . ; 


vc E _L _YOP _NGA 24 

send E| 

next .state . ; 
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FLCFR 

RFLCFR 

R 

FLCFW 

VC S _L _YOP _YGA 25 

vc s_l_yop_yga25 

VC S-L-YOP-YGA 25 

VC S _L _YOP _YGA 25 

if(ii){ 

i f ( orp) { 

send r; 

send Ej 

send E| 
next .state . ; 

} el se { 
do -fv; 
send r; 
next .state . ; 

} 

assert svc; 

} 

send r; 
next .state . ; 

next .state . ; 

next .state . ; 

VC E_L_YOP_YGA 26 

VC E-L-YOP-YGA26 

VC E-L-YOP-YGA 26 

VC E_L _YOP _YGA 26 

if(ii){ 

i f ( orp) { 

i f( nrp){ 

send Ej 

send E| 

assert svc; 

send E| 

next .state . ; 

next .state . ; 

} 

next .state . ; 


} el se { 

send r; 

} el se { 


do -tv; 
send r; 
next .state . ; 

} 

next .state . ; 

send r; 
next .state . ; 

} 


VYC S-U-NOP-NGA 27 

vyc s_u_nop_nga27 

VYC S-U-NOP-NGA 27 

VYC S-U-NOP-NGA 27 

assert rV; 

i f ( orp) { 

send r; 

assert T) 

assert i 

assert svc; 

next .state . ; 

send flcfw; 

do -tw; 

next .state . ; 

} 

send r; 
next .state . ; 


next .state . ; 

VYC E-U-NOP-NGA 28 

vyc e_u_nop_nga28 

VYC E_U_NOP-NGA 28 

VYC E-U-NOP-NGA 28 

assert rV; 

i f ( orp) { 

do -fs; 

do L; 

assert i 

assert svc; 

send r; 

if(ii){ 

do -tw; 

} 

next .state 27; 

do 4v; 

next .state . ; 

send r; 
next .state . ; 


} 

send 1 a; 
next .state 31; 

VYC S-L-NOP-NGA 29 

vyc s_l_nop_nga29 

VYC S-L-NOP-NGA 29 

VYC S-L-NOP-NGA 29 

if(ii){ 

i f ( orp) { 

send r; 

send Ej 

send E| 
next .state . ; 

} el se { 
assert rQ 
do -tv; 
send r; 
next .state . ; 

} 

assert svc; 

} 

send r; 
next .state . ; 

next .state . ; 

next .state . ; 
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I NVALI D 0 
assert Tj 
i f ( orp) { 
do L; 
send 1 ; 
next .state 4; 

} el se { 
send a; 
next .state . ; 

} _ 

C S_U_NOP_NGA 1 

do L; 
i f ( NRP ) { 
next .state 3; 

} el se { 
if(ii){ 
do -tv; 

next .state 23; 
} el se { 
next .state 4; 
}} 

send 1 a; 
c E-U-NOP-NGA 2 
do L; 
i f ( NRP ) { 
next .state 3; 

} el se { 
if(ii){ 
do -tv; 

next .state 23; 
} el se { 
next .state 4; 
}} 

send 1 a; 
c S-L-NOP-NGA 3 
send E| 
next .state . ; 


I NVALID 0 
next-state X; 


I NVALI D 0 
next-state X; 


I NVALI D 0 
next .state X; 


C S-U-NOP-NGA 1 
next-state X; 


C E_U_NOP_NGA 2 
next-state X; 


c s_l_nop_nga 3 
assert ic; 
do -(i ; 
i f(Oc) { 
assert T} 
send a; 
next .state 0; 

} el se { 
next _s t at e . : 


C S _U_NOP _NGA 1 
next-state X; 


C E-U-NOP_NGA 2 
next-state X; 


C S _L _NOP _NGA 3 
next-state X; 


C S _U_NOP _NGA 1 
assert i c; 
do -(i ; 

if(0c) { 

assert T} 
send ta; 
next .state 0; 

} el se { 
next .state . ; 

} 


C E_U_NOP _NGA 2 
assert i c; 
do -(i ; 
assert 0Q 
i f ( >l&lvw:) { 
send cte; 

} 

next .state . ; 


c s _L _NOP _NGA 3 
assert i c; 
do -fv; 

if(0c) { 

assert T} 
send uncv; 
next .state 9; 

} el se { 
next .state 22: 
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C S _L _Y OP _N GA 4 

send Ej 

next .state . ; 


C S _L_YOP_NGA 4 
assert i c; 
do -(i ; 
assert OQ 
next .state . ; 


Al 

C S_L_YOP_NGA 4 
assert i c; 
do -fc; 
if(lc){ 
assert TJ 
send al; 
next .state 3; 

} el se { 


TA 

C S_L_YOP_NGA 4 
assert 1 
assert i c; 
do 4 y; 
i f( Oc) { 

if ('!){ 

send uncv; 

} 




next .state 6; 

next .state 10; 



} 

} el se { 
next .state 23; 
} 

C E_L_YOP-NGA 5 

C E-L-YOP-NGA 5 

C E-L-YOP-NGA 5 

C E_L_YOP-NGA 5 

send Ej 

assert i c; 

assert i c; 

assert 1 

next .state . ; 

do 4i ; 

do 4c; 

assert i c; 


assert OQ 

if(lc){ 

do 4 y; 


next .state . ; 

send yo; 

i f(0c) { 



next .state 2; 

next .state 11; 



} el se { 

} el se { 



next .state 7; 

next .state 24; 



} 

} 

C S -L-YOP-YGA 6 

C S-L-YOP-YGA 6 

C S _L_YOP-YGA 6 

C S _L_YOP-YGA 6 

send Ej 

assert i c; 

next .state X; 

assert 1 iy 

next .state . ; 

do 4i ; 


assert i c; 


if(lc){ 


do 4 y; 


assert 


i f(0c) { 


send al; 


if(l){ 


next .state 3; 
} else { 



next .state . ; 

} 


next .state 12; 
} el se { 
next .state 25; 
} 

C E _L_YOP-YGA 7 

C E _L_YOP-YGA 7 

C E _L_YOP-YGA 7 

C E-L-YOP-YGA 7 

send Ej 

assert i c; 

next-state X; 

assert 1 iy 

next .state . ; 

do 4i ; 
if(lc){ 
send yo; 
next _state 2; 

} else { 
next .state . ; 


assert i c; 
do 4y; 
i f(0c) { 
next .state 13; 
} el se { 
next _s t at e 26; 
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L 

A 

Al 

TA 

V S _U_NOP_NGA 8 
do L; 
i f ( nrp) { 
send la; 
next .state 9; 

} el se { 
if(ii){ 
do -tv; 

} 

send la; 
next .state 10; 

} 

V S-U-NOP-NGA 8 
next-state X; 

V S _U_NOP_NGA 8 
next-state X; 

V S_U_NOP_NGA 8 
next-state X; 

V S_L_NOP_NGA 9 

send E| 
next .state . ; 

V S-L-NOP-NGA 9 
assert i v; 
do -(i ; 
if (0v){ 
assert T) 
send a; 
next .state 0; 

} el se { 
next .state . ; 

} 

V S_L_NOP_NGA 9 
next-state X; 

V S-L-NOP-NGA 9 
next-state X; 

V S_L_YOP-NGA 10 

send E| 
next .state . ; 

V S_L_YOP_NGA 10 
assert i v; 
do -(i ; 
assert 0V; 
next .state . ; 

V S-L-YOP-NGA 10 
assert i v; 
do -fc; 
i f(lvc){ 
assert T) 
send al; 
next .state 3; 

} el se { 
next .state 25; 

} 

V S-L-YOP-NGA 10 
next-state X; 

V E-L-YOP-NGA 11 

send E| 
next .state . ; 

V E_L_YOP_NGA 11 
assert i v; 
do -(i ; 
assert 0V; 
next .state . ; 

V E_L_YOP-NGA 11 
assert i v; 
do 4c; 
i f(lvc){ 
send yo; 
next _state 2; 

} el se { 
next _state 26; 

V E-L-YOP-NGA 11 
next-state X; 
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V S_L_YOP_YGA 12 

send E| 
next .state . ; 


V E_L_YOP_YGA 13 

send E| 
next .state . ; 


VW S _U_NOP_NGA 14 
do L; 
i f ( nrp) { 
send la; 
next .state 15; 

} el se { 
if(ii){ 
do 4v; 

} 

send la; 
next .state 16; 

} _ 

V W S _L _NOP _NGA 15 

se nd E| 
next .state . ; 


VWS-L-YOP-NGA 16 

send E| 
next .state . ; 


V S-L-YOP-YGA 12 
assert i v; 

do -(i ; 

i f (1 v ) { 

as s e r t T} 
assert 1 v; 
do 4c; 
send al; 
ne xt _s t at e 3; 

} el se { 
ne xt _s t at e . ; 

_} _ 

V E_L_YOP-YGA 13 

assert i v; 
do -(i ; 

i f (1 v ) { 

assert 1 v; 
do 4c; 
send yo; 
ne xt _s t at e 2; 

} el se { 
ne xt _s t at e . ; 

_} _ 

VWS _U_NOP-NGA 14 

next .state X; 


V S _L _YOP _YGA 12 
next-state X; 


V S _L _YOP _YGA 12 
next .state X; 


VW S _L -NOP _NGA 15 
assert i v; 
do -(i ; 
assert OV; 
next .state . ; 
v w S _L -YOP _NGA 16 
assert i v; 
do -(i ; 
assert OV; 
next _s t at e . ; 


V E-L-YOP-YGA 13 
next-state X; 


VW S _U_NOP_NGA 14 
next-state X; 


vw s-L-NOP-NGA 15 
next .state X; 


VW S _L _YOP -NGA 16 
assert i v; 
do -fc; 
assert OV; 
next .state 32: 


v E_L_YOP-YGA 13 
next .state X; 


vw s _U _NOP-NGA 14 
next .state X; 


VWS -L-NOP-NGA 15 
next .state X; 


VWS _L -YOP-NGA 16 
next .state X; 
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VW E_L_YOP_NGA 17 

send E| 
next .state . ; 


V W S _L _YOP _YGA 18 

send E| 
next .state . ; 


VW E_L_YOP_YGA 19 

send E| 
next .state . ; 


vc s_u_nop_nga 20 
do L; 
i f ( nrp) { 
next .state 22; 

} el se { 
if(ii){ 
do -fv; 

} 

next .state 23; 

} 

send la; 

vc E_U_NOP-NGA 21 
do L; 
i f ( nrp) { 
next .state 22; 

} el se { 
if(ii){ 
do -+v; 

} 

next .state 23; 

} 

send la: 


VWE-L-YOP-NGA 17 
assert i v; 
do -(i ; 
assert 0V; 
next .state . ; 
v w s _L _YOP _YGA 18 
assert i v; 
do -(i ; 
assert 0V; 
next .state . ; 
VWE-L-YOP-YGA 19 
assert i v; 
do -(i ; 
assert 0V; 
next .state . ; 
vc s _u_nop_nga 20 
next .state X; 


VC E_U_NOP-NGA 21 
next .state X; 


V W E _L _YOP _N GA 17 
assert i v; 
do -jc; 
assert 0V; 
next .state 33; 

VW S _L _YOP _YGA 18 
next .state X; 


VW E_L _YOP _YGA 19 
next .state X; 


vc S-U-NOP-NGA 20 
next .state X; 


vc e_u_nop_nga 21 
next .state X; 


V W E _L _YOP _N GA 17 
next .state X; 


VW S _L _YOP _YGA 18 
next .state X; 


VW E _L _YOP _YGA 19 
next .state X; 


vc s_u_nop_nga 20 
assert ic; 
do -(i ; 
if(0c) { 
assert T) 
send uncv; 
next .state 8; 

} el se { 
next .state . ; 


VC E_U_NOP-NGA 21 
assert ic; 
do -(i ; 
assert 0Q 
next .state . ; 
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VC S_L_NOP_NGA 22 

send E| 

next .state . ; 


vc s _L _YOP _Nga 23 

send E| 

next .state . ; 


VC E-L-YOP-NGA 24 

send E| 

next .state . ; 


vc s_l_nop_nga 22 
assert i vc; 
do -(i ; 
if (0v){ 
i f(Oc) { 
next .state X; 

} el se { 
next .state 3; 

} 

} el se { 

i f(Oc) { 

send uncv; 
next .state 9; 

} el se { 
next .state . ; 

_n_ 

VC S-L-YOP-NGA 23 
assert i vc; 
do -(i ; 
if (0v){ 
i f(Oc) { 

next .state X; 

} el se { 
next .state 4; 

} 

} el se { 

i f(Oc) { 

next .state 10; 

} el se { 
next .state . ; 

_n_ 

VC E _L _YOP _NGA 24 
assert i vc; 
do -(i ; 
if (0v){ 
i f(Oc) { 

next .state X; 

} el se { 
next _state 5; 

} 

} el se { 

i f(Oc) { 

next .state 11; 

} el se { 
next _s t at e . : 


vc s_l_nop_nga 22 
next .state X; 


vc s _L _YOP _N ga 23 
assert i vc; 
do -fc; 
if(0v){ 
next .state 6; 

} el se { 
next .state 25; 

} 


VC E_L_YOP-NGA 24 
assert i vc; 
do 4c; 
i f(lvc){ 
send yo; 
next .state 2; 

} el se { 
if (0v){ 
next .state 7; 

} el se { 
next _state 26; 
}} 


VC S _L_NOP_NGA 22 
assert i vc; 
do 4v; 
if(Oc) { 
assert T} 
send uncv; 
next .state 9; 

} el se { 
next .state . ; 

} 


VC S _L_YOP_NGA 23 
assert 1 
assert i vc; 
do 4v; 
i f (0c&3) { 
send uncv; 

} 

if(Oc) { 
next .state 10; 

} el se { 
next .state . ; 

} 


VC E_L_YOP-NGA 24 
assert 1 
assert i vc; 
do 4v; 
if(Oc) { 
next .state 11; 

} el se { 
next .state . ; 

} 
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VC S_L_YOP_YGA 25 

send E| 

next .state . ; 


VC E _L _YOP _YGA 26 

send E| 

next .state . ; 


vyc s_u_nop_nga 27 
do L; 
i f ( NRp) { 
next .state 29; 

} el se { 
if(ii){ 
do -fv; 

} 

next .state 30; 

} 

send la: 


VC S-L-YOP-YGA 25 
assert i vc; 
do -(i ; 
i f(lvc){ 
assert T) 
if(lv){ 
do 4c; 

} 

send al; 
next .state 3; 

} el se { 
if (0v){ 
next .state 6; 

} el se { 

i f(0c) { 

next .state 12; 

} el se { 
next .state . ; 

}» _ 

VC E_L_YOP_YGA 26 

assert i vc; 
do -(i ; 
i f(lvc){ 
if(lv){ 
do -fc; 

} 

send yo; 
next .state 2; 

} el se { 
if (0v){ 
next .state 7; 

} el se { 

i f(0c) { 

next .state 13; 

} el se { 
next .state . ; 

}» _ 

VYC S-U-NOP-NGA 27 

next .state X; 


vc s _L _YOP _YGA 25 
next .state X; 


VC E-L-YOP-YGA 26 
next-state X; 


VYC S -U-NOP _NGA 27 
next .state X; 


vc s _L _YOP _YGA 25 
assert 1 iy 
assert i vc; 
do -tv; 
i f (Oc&I) { 
send uncv; 

} 

if(0c){ 
next .state 12; 

} el s e { 
next .state . ; 

} 


VC E_L_YOP_YGA 26 
assert 1 iy 
assert i vc; 
do -tv; 
if(0c){ 
next .state 13; 

} el s e { 
next .state . ; 

} 


VYC S -U-NOP _NGA 27 
assert i c; 
do 4i ; 
if(0c){ 
assert T) 
send uncv; 
next .state 14; 

} el s e { 
next .state . ; 

} 
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L 

A 

Al 

TA 

VYC E_U_NOP_NGA 28 
do L; 
i f ( NRP) { 
next .state 29; 

} el se { 
if(ii){ 
do -tv; 

l 

VYC E-U-NOP-NGA 28 
next-state X; 

VYC E_U_NOP_NGA 28 
next .state X; 

VYC E _U_NOP _NGA 28 
assert ic; 
do 4i ; 
assert 0Q 
next .state . ; 

1 

next .state 30; 




S 

send la; 




VYC S-L-NOP-NGA 29 

send E| 

next .state . ; 

VYC S_L_NOP_NGA 29 
assert i vc; 
do -(i ; 
assert 0V; 

i f(0c) { 

send uncv; 
next .state 15; 

} el se { 
next .state . ; 

} 

VYC S _L _NOP _NGA 29 
next .state X; 

VYC S _L _NOP _NGA 29 
assert i vc; 
do -tv; 
i f(0c) { 
assert 45 
send uncv; 
next .state 15; 

} el se { 
next .state . ; 

} 

VYC S_L_YOP_NGA 30 

send E| 

next .state . ; 

VYC S_L_YOP_NGA 30 
assert i vc; 
do -(i ; 
assert 0V; 

i f(0c) { 

next .state 16; 

} el se { 
next .state . ; 

} 

VYC S-L-YOP-NGA 30 
assert i vc; 
do -fc; 
assert 0V; 
next .state 32; 

VYC S _L _YOP _NGA 30 
assert 1 iy 
assert i vc; 
do -tv; 
if(0c&4){ 
send uncv; 

} 

i f(0c) { 
next .state 16; 

} el se { 
next .state . ; 

} 

VYC E_L_YOP_NGA 31 

send E| 
next .state . ; 

VYC E_L_YOP_NGA 31 
assert i vc; 
do -(i ; 
assert 0V; 

i f(0c) { 

next .state 17; 

} el se { 
next .state . ; 

} 

VYC E _L _YOP _NGA 31 
assert i vc; 
do 4c; 
assert 0V; 
next _state 33; 

VYC E_L_YOP_NGA 31 
assert 1 iy 
assert i vc; 
do -tv; 
i f(0c) { 
next .state 17; 

} el se { 
next .state . ; 

} 
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L 

A 

Al 

TA 

VWC S_L_YOP_YGA 32 

send E| 

next .state . ; 

VWC S-L-YOP-YGA 32 
assert i vc; 
do -(i ; 
assert OV; 

i f(Oc) { 

next .state 18; 

} el se { 
next .state . ; 

} 

VWC. S _L _YOP _YGA 32 
next .state X; 

VWC S _L _YOP _YGA 32 
assert 1 iy 
assert i vc; 
do -+v; 
i f (0c&3) { 
send uncv; 

} 

i f(Oc) { 
next .state 18; 

} el s e { 
next .state . ; 

} 

VWC E _L _YOP _YGA 33 

VWC E-L-YOP-YGA 33 

vwc. E _L _YOP _YGA 33 

VWC E_L_YOP_YGA 33 

send E| 

assert i vc; 

next .state X; 

assert 1 iy 

next .state . ; 

do -(i ; 


assert i vc; 


assert OV; 


do -tv; 


i f(Oc) { 


i f(Oc) { 


next .state 19; 


next .state 19; 


} el se { 


} el s e { 


next .state . ; 


next .state . ; 
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CTE 

FLCFT 

RELCFT 

cv 

I NVALI D 0 
next .state X; 

I NVALI D 0 
if(t){ 
al 1 oc; 

next .state 2; 

} el se { 
send flcft; 
next .state . ; 

} 

I NVALI DO 

assert 
send rflcf t; 
next .state . ; 

I NVALI D 0 
ne xt _s t at e X; 

C S _U_NOP_NGA 1 

do -+e; 

next .state 2; 

C S _U_NOP _NGA 1 
send rt; 
next .state . ; 

C S-U-NOP-NGAl 
send rt; 
next .state . ; 

C S _U_NOP _NGA 1 
as s e r t i c; 
ne xt _s t at e . ; 

C E-U-NOP-NGA 2 
next-state X; 

C E_U_NOP_NGA 2 
send rt; 
next .state . ; 

C E-U-NOP-NGA2 
send rt; 
next .state . ; 

C E_U_NOP_NGA 2 
as s e r t i c; 
ne xt _s t at e . ; 

C S-L-NOP-NGA 3 
next-state X; 

C S_L_NOP_NGA 3 

send E| 

next .state . ; 

C S-L-NOP-NGA3 
send E| 
next .state . ; 

C S _L _NOP _NGA 3 
as s e r t i c; 
ne xt _s t at e . ; 

C S-L-YOP-NGA 4 
next-state X; 

C S_L_YOP-NGA 4 

send E| 

next .state . ; 

C S-L-YOP-NGA4 
send E| 
next .state . ; 

C S-L-YOP-NGA 4 
as s e r t i c; 
ne xt _s t at e . ; 

C E-L-YOP-NGA 5 
next-state X; 

C E_L_YOP-NGA 5 

send E| 

next .state . ; 

C E-L-YOP-NGA5 
send E| 
next .state . ; 

C E-L-YOP-NGA 5 
as s e r t i c; 
ne xt _s t at e . ; 

C S-L-YOP-YGA 6 
next-state X; 

C S-L-YOP-YGA 6 

send E| 
next .state . ; 

C S-L-YOP-YGA6 

send E| 
next .state . ; 

C S-L-YOP-YGA 6 
as s e r t i c; 
ne xt _s t at e . ; 

C E_L_YOP-YGA 7 
next .state X; 

C E-L-YOP-YGA 7 

send E| 

next .state . ; 

C E-L-YOP-YGA7 
send E| 
next .state . ; 

C E-L-YOP-YGA 7 
as s e r t i c; 
ne xt _s t at e . ; 

V S-U-NOP-NGA 8 
next-state X; 

V S-U-NOP-NGA 8 

assert 
send flcft; 
next .state . ; 

v s_u_nop_nga8 

assert 
send rflcf t; 
next .state . ; 

V S-U-NOP-NGA 8 

as s e r t 
assert i v; 
do -(c; 
send cv; 
if (0v){ 
next .state 1; 

} el se { 
next .state 20; 

} 

V S _L _NOP _NGA 9 
next .state X; 

V S-L-NOP-NGA 9 

send E| 

next .state . ; 

V S-L-NOP-NGA9 
send E| 
next .state . ; 

V S_L-NOP-NGA 9 

as s e r t 
assert i v; 
do -(c; 
send cv; 
if (0v){ 
next _s t at e 3; 

} el se { 
next .state 22; 

} 



APPENDI X C. TABLE OF PROTOCOL BEE AVI OR 


V S_L_YOP_NGA 10 
next .state X; 


V E_L_YOP_NGA 11 
next .state X; 


V S-L-YOP-YGA 12 
next-state X; 


V E_L_YOP_YGA 13 
next .state X; 


VW S _U_NOP_NGA 14 
next .state X; 


V S-L-YOP-NGA 10 

send E| 
next .state . ; 


V E_L_YOP_NGA 11 

send E| 
next .state . ; 


V S _L _YOP _YGA 12 

send E| 
next .state . ; 


V E-L-YOP-YGA 13 

send E| 
next .state . ; 


VW S -U-NOP _NGA 14 
assert T} 
send flcf t; 
next _s t at e . ; 


V S-L-YOP-NGAlO 

send E| 

next .state . ; 


V E-L-YOP-NGAll 

send E| 

next .state . ; 


v s_l_yop_yga12 

send E| 

next .state . ; 


v e_l_yop_yga13 

send E| 

next .state . ; 


v w s _u_nop _nga1 4 
assert T} 
send rflcf t; 
next _s t at e . ; 


V S -L-YOP-NGA 10 

assert i v; 

do -(e; 

if ('!){ 

send cv; 

} 

if (0v){ 
next .state 4; 

} el se { 
next .state 23; 

_}_ 

V E _L_YOP_NGA 11 

assert i v; 
do -(c; 
if (0v){ 
next .state 5; 

} el se { 
next .state 24; 

_}_ 

V S _L_YOP_YGA 12 

assert i v; 
do 4c; 

if ('!){ 

send cv; 

} 

if (0v){ 
next .state 6; 

} el se { 
next .state 25; 

_}_ 

V E _L _YOP _YGA 13 

assert i v; 
do 4c; 
if (0v){ 
next .state 7; 

} el se { 
next .state 26; 

_}_ 

VW S _U_NOP_NGA 14 

assert T} 
assert i v; 
do 4c; 

send rdaw, c; 
send cv; 
if (0v){ 
next _s t at e 1; 

} el se { 
next .state 20: 
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VW S _L_NOP_NGA 15 
next .state X; 


VW S _L_NOP_NGA 15 

send E| 
next .state . ; 


VW S _L _YOP _NGA 16 
next .state X; 


VWS-L-YOP-NGA 16 

send E| 
next .state . ; 


VW E_L_YOP_NGA 17 VWE_L_YOP_NGA 17 

next .state X; send E| 

next .state . ; 


V W S _L _YOP _YGA 18 
next-state X; 


VWS_L _YOP _YGA 18 

send E| 
next .state . ; 


vws_l_nop_nga15 

send E| 

next .state . ; 


vws_l_yop_nga16 

send E| 

next .state . ; 


v w E _L _YOP _nga1 7 

send E| 

next .state . ; 


vw s _L _YOP _yga1 8 

send E| 

next .state . ; 


vws -L-NOP-NGA 15 
assert T} 
assert i v; 
do de; 

send rdaw, c; 
send cv; 
if (0v){ 
next .state 3; 

} el se { 
next .state 22; 

_} _ 

VWS _L_YOP_NGA 16 

assert i v; 
do de; 

if ('!){ 

send cv; 

} 

send rdaw, c; 
if (0v){ 
next .state 4; 

} el se { 
next .state 23; 

_}_ 

VWE_L_YOP_NGA 17 

assert i v; 
do de; 

send rdaw, c; 
if (0v){ 
next .state 5; 

} el se { 
next .state 24; 

_} _ 

VW S _L _YOP _YGA 18 

assert i v; 
do de; 

if ('!){ 

send cv; 

} 

send rdaw, c; 
if (0v){ 
next _s t at e 6; 

} el se { 
next .state 25: 
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CTE 

FLCFT 

RELCFT 

cv 

VW E_L_YOP_YGA 19 
next .state X; 

V W E _L _YOP _YGA 19 

send E| 
next .state . ; 

V W E _L _YOP _YGAl 9 

send E| 

next .state . ; 

VWE-L-YOP-YGA 19 
assert i v; 
do -(c; 

send rdaw, c; 
if (0v){ 
next .state 7; 

} el se { 
next .state 26; 

} 

VC S _U_NOP_NGA 20 
do -+e; 

next .state 21; 

VC S -U-NOP-NGA 20 
send rt ; 
next .state . ; 

vc s_u_nop_nga 20 
send rt ; 
next .state . ; 

VC S_U_NOP_NGA 20 
assert i vc; 
do -(c; 
if (0v){ 
next .state 1; 

} el se { 
next .state . ; 

} 

VC E_U_NOP_NGA 21 
next-state X; 

VC E -U-NOP-NGA 21 
send rt ; 
next .state . ; 

vc e_u_nop_nga 21 
send rt ; 
next .state . ; 

VC E_U_NOP_NGA 21 
assert i vc; 
do -(c; 
if (0v){ 
next .state 2; 

} el se { 
next .state . ; 

} 

VC S-L-NOP-NGA 22 
next-state X; 

VC S -L-NOP-NGA 22 

send E| 
next .state . ; 

vc s_l_nop_nga22 

send E| 

next .state . ; 

VC S _L_NOP_NGA 22 
assert i vc; 
do -(c; 
if (0v){ 
next .state 3; 

} el se { 
next .state . ; 

} 

VC S-L-YOP-NGA 23 
next-state X; 

VC S -L-YOP-NGA 23 

send E| 
next .state . ; 

vc s_l_yop_nga23 

send E| 

next .state . ; 

VC S _L_YOP_NGA 23 
assert i vc; 
do -(c; 
if (0v){ 
next .state 4; 

} el se { 
next .state . ; 

} 

VC E-L-YOP-NGA 24 
next .state X; 

VC E-L-YOP-NGA 24 

send E| 

next .state . ; 

vc e_l_yop_nga24 

send E| 

next .state . ; 

VC E_L-YOP_NGA 24 
assert i vc; 
do -(c; 
if (0v){ 
next _s t at e 5; 

} el se { 
next .state . ; 

} 



C. 2. PARENT NODE TRANS I TI ON TABLE 


147 


CTE 

FLCFT 

RELCFT 

cv 

VC S _L _YOP _YGA 25 
next .state X; 

VC S-L-YOP-YGA 25 

send E| 

next .state . ; 

vc s_l_yop_yga25 

send E| 

next .state . ; 

VC S _L _YOP _YGA 25 
assert i vc; 
do 4c; 

if ('!){ 

send cv; 




y 

if (0v){ 
next .state 6; 

} el se { 
next .state . ; 

} 

VC E_L_YOP_YGA 26 
next .state X; 

VC E-L-YOP-YGA 26 

send E| 
next .state . ; 

VC E-L-YOP-YGA26 

send E| 
next .state . ; 

VC E_L _YOP _YGA 26 
assert i vc; 
do 4c; 
if (0v){ 
next .state 7; 

} el se { 
next .state . ; 

} 

VYC S-U-NOP-NGA 27 
do 4e; 

next .state 28; 

VYC S-U-NOP-NGA 27 
send rt; 
next .state . ; 

vyc s_u_nop_nga 27 
send rt; 
next .state . ; 

VYC S-U-NOP-NGA 27 
assert i vc; 
do -fc; 

send rdaw, c; 
if (0v){ 
next .state 1; 

} el se { 
next .state 20; 

} 

VYC E-U-NOP-NGA 28 
next .state X; 

VYC E_U_NOP-NGA 28 
send rt; 
next .state . ; 

vyc e_u_nop_nga 28 
send rt; 
next .state . ; 

VYC E_U_NOP-NGA 28 
assert i vc; 
do 4c; 

send rdaw, c; 
if (0v){ 
next .state 2; 

} el se { 
next .state 21; 

} 

VYC S-L-NOP-NGA 29 
next .state X; 

VYC S -L_NOP_NGA 29 

send E| 

next .state . ; 

vyc s_l_nop_nga29 

send E| 

next .state . ; 

VYC S_L-NOP-NGA 29 
assert i vc; 
do 4c; 

send rdaw, c; 
if (0v){ 
next .state 3; 

} el se { 
next .state 22; 

} 
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CTE 

FLCFT 

RELCFT 

cv 

VYC S _L _Y OP _N GA 30 
next .state X; 

VYC S-L-YOP-NGA 30 

send E| 

next .state . ; 

vyc s_l_yop_nga30 

send E| 

next .state . ; 

VYC S _L _YOP _NGA 30 
assert i vc; 
do 4c; 

s e nd r daw, c; 
if (0v){ 
next .state 4; 

} el se { 
next .state 23; 

} 

VYC E_L_YOP_N GA 31 
next .state X; 

VYC E-L-YOP-NGA 31 

send E| 
next .state . ; 

vyc e_l_yop_nga31 

send E| 

next .state . ; 

VYC E _L _YOP _NGA 31 
assert i vc; 
do -fc; 

s e nd r daw, c; 
if (0v){ 
next .state 5; 

} el se { 
next .state 24; 

} 

VYC S-L-YOP-YGA 32 
next .state X; 

VYC S-L-YOP-YGA 32 

send E| 

next .state . ; 

vyc s_l_yop_yga32 

send E| 

next .state . ; 

VYC S_L_YOP-YGA 32 
assert i vc; 
do 4c; 

s e nd r daw, c ; 

if ('!){ 

send cv; 

} 

if (0v){ 
next .state 6; 

} el se { 
next .state 25; 

} 

VYC E-L-YOP-YGA 33 
next .state X; 

VYC E-L-YOP-YGA 33 

send E| 

next .state . ; 

vyc e_l_yop_yga33 

send E| 

next .state . ; 

VYC E_L_YOP-YGA 33 
assert i vc; 
do 4c; 

s e nd r daw, c ; 
if (0v){ 
next .state 7; 

} el se { 
next .state 26; 

} 
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RD 

UNCV 

RT 

wo 

C S _L _YOP _YGA 6 
next .state . ; 

C S _L _YOP _YGA 6 
assert ic; 
do -fv; 
if (0c&0){ 
send uncv; 

} 

i f(0c) { 
next .state 12; 

} el se { 
next .state 25; 

} 

C S _L _YOP _YGA 6 

send E| 

next .state . ; 

C S _L _YOP _YGA 6 
next-state X; 

C E_L_YOP_YGA 7 
next .state X; 

C E_L_YOP_YGA 7 
assert ic; 
do -tv; 

i f(0c) { 

next .state 13; 

} el se { 
next .state 26; 

} 

C E_L _YOP _YGA 7 

send E| 

next .state . ; 

C E_L _YOP _YGA 7 
next .state X; 

V S-U-NOP-NGA 8 
send rdav, c; 
next .state 1; 

V S -U-NOP-NGA 8 
next .state X; 

V S-U-NOP-NGA 8 
assert T} 
send rflcf t; 
next .state . ; 

V S-U-NOP-NGA 8 
next-state X; 

V S-L-NOP-NGA 9 
send rdav, c; 
next .state 3; 

V S _L_NOP-NGA 9 
next-state X; 

V S_L_NOP_NGA 9 

send E| 

next .state . ; 

V S-L-NOP-NGA 9 
next-state X; 

V S_L_YOP-NGA 10 

send rdavL, c; 
i f(0c) { 

next .state . ; 

} el se { 
next .state 23; 

} 

V S-L-YOP-NGA 10 
next-state X; 

V S_L_YOP_NGA 10 

send E| 

next .state . ; 

V S-L-YOP-NGA 10 
next-state X; 

V E_L_YOP-NGA 11 
next .state X; 

V E-L-YOP-NGA 11 
next-state X; 

V E_L_YOP-NGA 11 

send E| 
next .state . ; 

V E-L-YOP-NGA 11 
next-state X; 

V S-L-YOP-YGA 12 

send rdavL, c; 

i f (0c) { 

next .state . ; 

} el se { 
next _state 25; 

} 

V S _L _YOP _YGA 12 
next .state X; 

V S _L _YOP _YGA 12 

send E| 

next .state . ; 

V S _L _YOP _YGA 12 
next .state 5; 

V E-L-YOP-YGA 13 
next .state X; 

V E-L-YOP-YGA 13 
next .state X; 

V E_L-YOP-YGA 13 

send E| 
next .state . ; 

V E_L-YOP-YGA 13 
next .state 5; 
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RD 

UNCV 

RT 

VO 

VW S _U_NOP_NGA 14 
s e nd r davw, c; 
next .state 1; 

VW S -U-NOP-NGA 14 
next .state X; 

VWS_U_NOP_NGA 14 
assert T} 
send rflcf t; 
next .state . ; 

VW S _U_NOP_NGA 14 
next .state X; 

VW S _L_NOP_NGA 15 
s e nd r davw, c; 
next .state 3; 

VWS -L-NOP-NGA 15 
next .state X; 

VWS-L-NOP-NGA 15 

send E| 

next .state . ; 

VWS _L_NOP_NGA 15 
next .state X; 

VW S _L _YOP _NGA 16 
send rdavwL, c; 
next .state 23; 

VWS -L-YOP-NGA 16 
next .state X; 

VWS-L-YOP-NGA 16 

send E| 

next .state . ; 

VWS _L_YOP_NGA 16 
next .state X; 

VW E_L_YOP_NGA 17 
next .state X; 

VWE -L-YOP-NGA 17 
next .state X; 

VWE-L-YOP-NGA 17 

send E| 

next .state . ; 

VWE_L _YOP _N GA 17 
next .state X; 

V W S _L _YOP _YGA 18 

send rdavwL, c; 
next .state 25; 

V W S _L _YOP _YGA 18 
next .state X; 

VWS-L-YOP-YGA 18 

send E| 

next .state . ; 

VW S _L _YOP _YGA 18 
next .state X; 

VWE-L-YOP-YGA 19 
next .state X; 

VWE-L-YOP-YGA 19 
next .state X; 

VWE_L-YOP-YGA 19 

send E| 
next .state . ; 

VWE_L-YOP-YGA 19 
next .state X; 

VC S-U-NOP-NGA 20 
next .state . ; 

VC S -U-NOP-NGA 20 
as s e r t i c; 
do -+v; 
if(0c){ 
as s e r t T} 
send uncv; 
ne xt _s t at e 8; 

} el se { 
ne xt _s t at e . ; 

} 

VC S-U-NOP-NGA 20 
send rt; 
next .state . ; 

VC S-U-NOP-NGA 20 
next .state X; 

VC E_U_NOP-NGA 21 
next .state X; 

VC E _U_NOP-NGA 21 
as s e r t i c; 
do -tv; 
as s e r t 0 Q 
next .state . ; 

VC E_U_NOP-NGA 21 
send rt; 
next .state . ; 

VC E _U_NOP-NGA 21 
next .state X; 

VC S-L-NOP-NGA 22 
next .state . ; 

VC S-L-NOP-NGA 22 
as s e r t i c; 
do -tv; 
if (0c&0){ 
send uncv; 

VC S-L-NOP-NGA 22 

send E| 

next .state . ; 

VC S-L-NOP-NGA 22 
next .state X; 


y 

if(0c){ 

ne xt _s t at e 9; 

} el se { 
ne xt _s t at e . ; 

} 
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VC S_L_YOP_NGA 23 
next .state . ; 


VC E _L _Y OP _NGA 24 
next .state X; 


VC S-L-YOP-YGA 25 
i f (lc&l c) { 
send rdavL, c; 
next .state 6; 

} el se { 
next .state . ; 


vc E-L-YOP-YGA 26 
next .state X; 


vwc s_u_nop_nga 27 
next .state . ; 


VC S-L-YOP-NGA 23 
assert i c; 
do -fv; 
i f ( 0c&3) { 
send uncv; 

} 

i f( Oc) { 
next .state 10; 

} el se { 
next .state . ; 

_}_ 

VC E_L_YOP_NGA 24 

assert i c; 
do -fv; 

i f(0c) { 

next .state 11; 

} el se { 
next .state . ; 

_}_ 

VC S _L _YOP _YGA 25 

assert i c; 
do -fv; 
i f ( 0c&3) { 
send uncv; 

} 

i f(0c) { 

next .state 12; 

} el se { 
next .state . ; 

_}_ 

VC E _L _YOP _YGA 26 

assert i c; 
do -fv; 

i f(0c) { 
next .state 13; 

} el se { 
next .state . ; 

_}_ 

VYC S-U-NOP-NGA 27 

assert i c; 
do -tv; 

i f(0c) { 

assert T} 
send uncv; 
next .state 14; 

} el se { 
next _s t at e . ; 


vc s _L _YOP _NGA 23 

send E| 

next .state . ; 


VC E-L-YOP-NGA 24 

send E| 

next .state . ; 


vc s _L _YOP _YGA 25 

send E| 

next .state . ; 


VC E _L _Y OP _ YGA 26 

send E| 

next .state . ; 


vc s _L _YOP _NGA 23 
next .state X; 


VC E_L_YOP_NGA 24 
next .state X; 


VC S _L _YOP _YGA 25 
next .state X; 


VC E_L_YOP_YGA 26 
next .state X; 


vw. S -U-NOP _NGA 27 vvc s _U_NOP _NGA 27 
sendrt; next .state 5; 

next .state . ; 
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RD 

UNCV 

RT 

W) 

VYC E_U_NOP_NGA 28 
next .state X; 

VYC E-U-NOP-NGA 28 
assert ic; 
do -+v; 
assert 0Q 
next .state . ; 

VYC E_U_NOP_NGA 28 
send rt; 
next .state . ; 

VYC E -U-NOP _NGA 28 
next .state X; 

VYC S_L_NOP_NGA 29 
next .state . ; 

VYC S-L-NOP-NGA 29 
assert ic; 
do -+v; 
i f ( Oc&I) { 
send uncv; 

} 

i f(0c) { 
next .state 15; 

} el se { 
next .state . ; 

} 

VYC S _L _NOP _NGA 29 

send E| 

next .state . ; 

VYC S _L _NOP _NGA 29 
next .state X; 

VYC S_L_YOP_NGA 30 
next .state . ; 

VYC S_L_YOP_NGA 30 
assert ic; 
do -fv; 
if (0c&0){ 
send uncv; 

} 

i f(0c) { 
next .state 16; 

} el se { 
next .state . ; 

} 

VYC S-L-YOP-NGA 30 

send E| 

next .state . ; 

VYC S _L _YOP _NGA 30 

next .state 

vyc E-L-YOP-NGA 31 
next .state X; 

VYC E_L_YOP_NGA 31 
assert ic; 
do -tv; 

i f(0c) { 

next .state 17; 

} el se { 
next .state . ; 

} 

VYC E _L _YOP _NGA 31 

send E| 
next .state . ; 

VYC E _L _YOP _NGA 31 

next .state 

VYC S _L _YOP _YGA 32 
i f (lc&l c) { 
send rdavYii, c; 
next _state 6; 

} el se { 
next .state . ; 

} 

VYC S_L_YOP_YGA 32 
assert ic; 
do -tv; 
if (0c&0){ 
send uncv; 

} 

if(0c){ 
next .state 18; 

} el se { 
next .state . ; 

} 

VYC S-L-YOP-YGA 32 

send E| 

next .state . ; 

VYC S _L _YOP _YGA 32 
next .state X; 
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